Assignments: Class 1 Assignment Question 1
I propose a high-throughput microscopy tool to estimate intracellular PHA accumulation from granule count and size.
Current standard quantification methods are slow, labor-intensive, and often require hazardous solvent-based extraction. By pairing PHA staining (e.g., Sudan Black B or Nile Red A) with automated imaging and machine-learning (ML) image segmentation, this approach could rapidly screen large libraries of environmental isolates and recombinant strains for high PHA producers.
Homework Part 0: Basics of Gel Electrophoresis
I have watched the Week 2 lecture and recitation on DNA read/write/edit, restriction digests, Benchling, Twist, and gel electrophoresis.
Part 1: Benchling & In-silico Gel Art
Opened Benchling and signed up. Found the Lambda sequence here and copied the sequence without the header. Pasted this sequence into Benchling through “Create” > “DNA / RNA Sequence” > “New DNA / RNA Sequence”. Then I just pasted the sequence in the “Bases” field, titled it “Lambda,” and selected the topology as “Linear.”
Python Script for Opentrons Artwork Here’s my HTGAA 2026 Opentrons Art Python Script Submission.
The artistic design I created using the GUI is available here.
I heavily used the “Example 7 Microbial Earth” by Dominika Wawrzyniak, using pixels loaded from an external resource (a CSV file hosted on my GitHub page).
I used Dominika’s well documented Notion page from HTGAA21 to understand the code and replicate it for my case. I used Gemini assistance only to debug minor typos and syntax errors, and to identify which packages to import to execute the code.
Homework: Protein Design I Part A. Conceptual Questions 1) How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
~ 21% of meat is protein content (Smith et al. 2022) therefore, 500g meet contains about 105g of protein.
Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM
Question 1
This is human SOD1 sequence from UniProt (P00441) removing the initial Met
ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ FASTA
introducing the A4V mutant associated with the most aggressive forms of the ALS disease ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Question 2 and 3
With the help of ChatGPT and Gemni, I generated 2 new cells ir order to generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
Assignment: DNA Assembly Question 1: What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Phusion High-Fidelity PCR Master Mix is a 2X, ready-to-use mixture where the exact formulation is partly proprietary, but the functional components are documented in the manufacturer’s manual:
Component (Phusion 2X Master Mix) Purpose Phusion High-Fidelity DNA Polymerase DNA synthesis with high fidelity + proofreading dNTPs (dATP, dCTP, dGTP, dTTP) Building blocks for new DNA strands HF reaction buffer (salts + pH buffer) Maintains optimal pH/ionic strength for enzyme function Mg2+ (via buffer system; often MgCl2-derived) Essential polymerase cofactor Stabilizers / additives (partly proprietary) Improve enzyme stability and consistency Nuclease-free water Solvent to reach correct 2X working concentrations Reference: Thermo Fisher Phusion High–Fidelity DNA Polymerase Product Information Sheet, standard biochemistry manuals (e.g., Sambrook & Russell).
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs) Question 1
Traditional genetic circuits are usually implemented in Boolean logic (ON/OFF), hand-designed as fixed logic. so representing nuanced behaviors often requires many gates, sharp thresholds, and careful tuning, which can make designs bulky and brittle. As the number of inputs grows the circuit complexity can explode combinatorially, increasing burden by stacking multiple layers and adding intermediate nodes, which increases metabolic load, failure points, and sensitivity to part-to-part variability Also, adapting to new targets or shifting biological context often means redesigning the circuit architecture, not just re-tuning parameters.
Homework Part A: General and Lecturer-Specific Questions General homework questions Exercise 1
Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
Homework: Final Project What to measure?
I will measure visible melanin output in the material as the primary readout of the project.
I want to quantify:
Degree of darkening Spatial distribution of pigmentation Stability/Persistence of the pigmentation in the bacterial cellulose / after drying or storage These measurements are directly relevant because they indicate whether the melanin-producing system is functioning and whether the output is compatible with the intended material application.
Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork I contributed 7 pixels to the global artwork experiment, helping extend a horizontal yellow line in the top-left area (see screenshot below).
At first, I was cautious and tried to understand the ongoing ideas for each section and whether there was a unifying concept. I considered introducing something new, but ultimately decided to stick with what seemed to be the area’s goal (a horizontal yellow line). For next year, it might be fun to have an in-app chat within the same domain to coordinate contributions more easily and check the current vibes.
I worked on my Final Project and prepared it for the presentation on May 13 as part of the Committed Listeners group.
Subsections of Homework
Week 1 HW: Principles and Practices
Assignments: Class 1 Assignment
Question 1
I propose a high-throughput microscopy tool to estimate intracellular PHA accumulation from granule count and size.
Current standard quantification methods are slow, labor-intensive, and often require hazardous solvent-based extraction. By pairing PHA staining (e.g., Sudan Black B or Nile Red A) with automated imaging and machine-learning (ML) image segmentation, this approach could rapidly screen large libraries of environmental isolates and recombinant strains for high PHA producers.
Future upgrades, offered as a premium beta for testing, could add a “material profile” output by predicting PHA chain-length class (SCL, MCL, or LCL) from staining/fluorescence response patterns using the lipophilic dyes. This would enable not only faster strain selection but also early-stage differentiation of polymer type, which is critical for downstream biotechnology applications.
A further upgrade could generate image-driven optimization suggestions from microscopy images. For example, if it detects a high level of extracellular debris consistent with cell lysis, or a high abundance of product granules outside the cells, it could recommend exploring strain-engineering strategies that alter cell membrane composition to increase tolerance to mechanical stress and support higher intracellular polymer accumulation as cytoplasmic granules.
Question 2
Gov / Policy Goal 1: Prevent harmful misuse
• Sub-goal 1.1 - Limit repurposability: Reduce the extent to which the tool can be used as a general-purpose and high-throughput optimization engine outside its intended PHA scope, for example by restricting supported dyes and limiting microscopy calibration parameters to validated settings.
• Sub-goal 1.2 - Increase accountability: ensure high-impact uses are traceable and that institutions have a mechanism to intervene if misuse is suspected.
Gov / Policy Goal 2: Promote safe, responsible operation and research integrity
• Sub-goal 2.1 - Standardize safe use: Require adherence to Standard Operating Procedures (SOPs) for staining, imaging, and waste handling.
• Sub-goal 2.2 - Ensure competent users: Require completion of a short training module, including lab safety + tool-specific quality control (QC) before users can access advanced features or export “final” reports.
• Sub-goal 2.3 - Maintain data quality: Require basic QC checks (controls, calibration, and logging of model version and imaging settings) to reduce false positives/negatives and prevent misinterpretations.
Gov / Policy Goal 3: Maintain access for constructive uses (equity and scientific progress)
• Sub-goal 3.1 - Preserve legitimate research utility: avoid governance mechanisms that unnecessarily slow routine PHA research and screening.
• Sub-goal 3.2 - Proportional governance: apply stricter controls only to higher-impact capabilities (e.g., advanced optimization suggestions), rather than restricting all use.
Question 3
Option 1:
General action: Norms combined with oversight mechanisms (social/regulatory governance)
Purpose: Currently, PHA quantification is typically validated through chemical extraction and analytical methods rather than standardized image-based measurement. A robust image-analysis tool like this would significantly increase throughput and expand where and how screening can be performed. If an image-analysis approach is positioned as a scalable screening tool, it should include safeguards to prevent use outside validated conditions. A responsible-use policy with “red flag” triggers would provide a proportional oversight mechanism.
Design:
• Actors: principal investigators (PIs) and laboratory personnel (primary users), microscopy core facility staff, the university biosafety office (or equivalent), and an institutional ethics/biosafety committee.
• Mechanism: implement a short pre-use declaration form and a responsible-use policy that defines “red flag” contexts (e.g., high-throughput work on unverified environmental isolates without provenance, use outside standard biosafety environments, or attempts to generalize the tool beyond PHA workflows).
• Trigger response: if a red flag is triggered, require review by the biosafety/ethics committee (or the biosafety office) and compliance with institutional requirements before experiments or tool access continue.
Assumptions:
• Users will accurately disclose the intended use and experimental context (or there will be sufficient deterrence to reduce misreporting).
• Red-flag criteria can be defined clearly enough to be actionable and consistent across labs.
• The institution has capacity to perform timely reviews without creating major delays for legitimate projects.
• Some level of auditing is feasible (e.g., metadata logs or usage reporting), which may require limited access to usage data.
Risks of failure and “success”:
• The policy becomes symbolic and is not followed; criteria are too vague to enforce; or users misreport their purpose to avoid review.
• Overly broad triggers could make oversight routine, slowing research and disproportionately burdening smaller or under-resourced labs (equity and access concerns).
Option 2:
Restrict advanced features: High-impact features require auditable access (accountability governance)
Purpose: Add accountability for higher-impact features while keeping basic screening broadly accessible.
Design:
• Actors: tool developers (academic or company), institutions adopting the tool.
• Baseline access: basic PHA screening module available for standard use.
• Advanced access (premium/beta): requires institutional opt-in (verified affiliation, training completion, and standard operating procedures adherence).
• Logging: maintain run logs with technical metadata only (model version, stain, imaging settings, quality control pass/fail, solvent/waste metadata etc).
• Incident response: provide an incident-reporting channel so access can be suspended if misuse is suspected.
Assumptions:
• Logging and gating deter misuse without driving users to ungoverned copies.
• Metadata-only logs are sufficient for accountability without compromising privacy.
• Institutions are willing to administer opt-in and training requirements.
Risks of failure and “success”:
• Users bypass controls by using modified versions or alternative tools; logging becomes incomplete.
• Reduced accessibility and higher admin burden, potentially concentrating access in well-resourced labs.
• Analogy: similar to “KYC tiers” in financial systems: more powerful capabilities require stronger verification and auditability.
Option 3:
Just for PHA: Scope capabilities through validated workflows (technical strategy / design constraint).
Purpose: General-purpose screening tools are easier to repurpose. One way to limit their repurposability is by restricting the tool to validated PHA workflows.
Design:
• Actors: tool developers and maintainers; optionally journals or core facilities that require validated workflows for reporting.
• Technical constraint: restrict supported dyes and workflows to PHA-relevant staining and analysis; lock calibration parameters to validated microscopy settings; exclude generic “optimize any phenotype” modules.
• Reporting constraint: outputs are labeled as screening support, with clear limits on claims and recommended confirmatory methods for final quantification.
• The validated workflow remains useful across common lab setups and organisms.
• Users accept constraints rather than abandoning the tool.
Risks of failure and “success”:
• Restrictions are easily removed in forks / hacks etc; scope limits become ineffective.
• Reduced scientific and commercial usefulness, including for ethically beneficial non-PHA applications; may slow innovation.
• This is analogous to 3D printers that restrict materials and firmware settings: the core function remains available, but out-of-scope production becomes harder without intentional modification.
Question 4
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
2
1
3
• By helping respond
2
1
3
Foster Lab Safety
• By preventing incident
2
2
1
• By helping respond
3
1
3
Protect the environment
• By preventing incidents
2
1
2
• By helping respond
3
2
3
Other considerations
• Minimizing costs and burdens to stakeholders
2
3
1
• Feasibility?
2
2
1
• Not impede research
2
3
3
• Promote constructive applications
2
1
3
Question 5
I would prioritize Option 3 as the primary governance approach, aimed at tool developers and maintainers. Although Option 3 has the weakest overall score, I assign higher weight to practical implementability and consistent adoption, since governance mechanisms that require sustained oversight or significant administrative capacity are often applied inconsistently in real research settings. Option 3 can be implemented directly in software and routine workflows by restricting the tool to validated PHA use cases (supported dyes, locked calibration ranges, and scoped outputs). This reduces repurposability by design rather than relying on user compliance, making the default use safer and more predictable while preserving the core constructive application: scalable PHA screening.
The key trade-off is that Option 3 scores poorly on “helping respond” (biosecurity and lab safety), because it provides limited traceability and fewer mechanisms for intervention after deployment. It also narrows beneficial extensions beyond PHA, potentially limiting constructive applications in adjacent domains.
This recommendation also rests on several assumptions and uncertainties: that capability scoping meaningfully reduces repurposability in practice; that users will not widely circumvent constraints via modified versions or alternative tools; and that the validated workflow generalizes across common microscopes, organisms, and staining conditions.
Final Reflection
The main new ethical concern for me was how quickly a tool designed for a narrow, constructive purpose (PHA screening) can become a general “scale-up enabler” once it is automated and paired with machine-learning image analysis. To address this, I would recommend capability scoping by restricting the tool to validated PHA workflows (supported dyes, locked calibration ranges, and scoped outputs)
Week 2 Lecture Prep
Homework Questions from Professor Jacobson:
Question 1
High-fidelity, proofreading-proficient replicative DNA polymerases have an error rate of ≈ 10⁻⁶ during synthesis under standard conditions. The human nuclear genome is about 3.2 × 10⁹ base pairs per haploid set. If errors happened at 10⁻⁶ per base, you’d expect roughly 3.2 × 10⁹ × 10⁻⁶ ≈ 3.2 × 10³ (≈ 3,200) errors per haploid genome copy. However, in living cells, the effective replication error rate is far lower once proofreading (3′→5′ exonuclease) and post-replication repair (such as mismatch repair, MMR) are included: a commonly cited order of magnitude is ≈ 10⁻⁹ to 10⁻¹⁰ errors per base pair per replication.
Question 2
Because of codon degeneracy, the same amino-acid sequence can be encoded by many DNA coding sequences. A rough average multiplicity per amino acid is about 3.05 synonymous codons. Given an average human protein of 1036 bp and that coding DNA uses 3 bp per amino acid, 1036 bp / 3 ≈ 345 codons. So the number of different DNA coding sequences that produce the exact same protein is on the order of ≈ 10¹⁶⁷.
In practice, though, synonymous variants are not always functionally equivalent. Some synonymous changes produce transcripts with different stability and structure. For example, synonymous substitutions can lead to hairpins or repetitive motifs that increase recombination and reduce construct stability. They can also change ribosome speed patterns (which can alter co-translational folding and lead to misfolding, aggregation, or altered activity). In addition, synonymous changes can inadvertently create or disrupt regulatory sequence motifs (e.g., polyadenylation signals or splicing enhancer/silencer elements in eukaryotes).
Homework Questions from Dr. LeProust:
The gold standard for oligonucleotide synthesis is solid-phase oligonucleotide synthesis (SPOS) based on phosphoramidite chemistry (Walther et al. 2020). However, this method struggles beyond ~200 nt because every nucleotide is added through repeated chemical cycles, and small inefficiencies, truncation products, depurination, and side reactions compound with length. For the same reason, a 2000 bp gene cannot be made reliably by direct oligo synthesis. Instead, long genes are typically assembled from shorter oligos or DNA fragments, followed by error correction, cloning, and sequence verification.
Homework Question from George Church:
Question: What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?
Answer: The 10 essential amino acids in all animals are Arginine, Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, and Valine. Considering this, Jurassic Park’s biocontainment method is a joke, since it doesn’t create a unique dependency in animals: animals already can’t synthesize lysine. Also, as containment-by-dependency, it’s ecologically leaky because they did not consider the possibility that lysine was readily available in the environment. Lysine is available via plants and prey, so escape doesn’t remove access.
I have watched the Week 2 lecture and recitation on DNA read/write/edit, restriction digests, Benchling, Twist, and gel electrophoresis.
Part 1: Benchling & In-silico Gel Art
Opened Benchling and signed up. Found the Lambda sequence here and copied the sequence without the header. Pasted this sequence into Benchling through “Create” > “DNA / RNA Sequence” > “New DNA / RNA Sequence”. Then I just pasted the sequence in the “Bases” field, titled it “Lambda,” and selected the topology as “Linear.”
Clicked “Digest” (the scissors icon in the right menu), selected “All enzymes,” found all seven using the search tool, and clicked “Run Digest.”
This in-silico gel image uses simulated Lambda DNA restriction digest banding patterns from the required enzymes and arranges them as a visual composition inspired by Paul Vanouse’s Latent Figure Protocol.
Part 2: Gel Art - Restriction Digests and Gel Electrophoresis
I did not complete the wet-lab restriction digest and gel electrophoresis experiment. As a Committed Listener, I completed the required in-silico gel design in Benchling, but I did not have lab access for the optional wet-lab portion.
Part 3: DNA Design Challenge
3.1. Choose your protein: Poly(3-hydroxyalkanoate) polymerase subunit PhaC
I chose Polyhydroxyalkanoate synthase (PhaC) because it is involved in the catalysis of the reaction that polymerizes (R)-3-hydroxybutyryl-CoA to produce polyhydroxybutyrate (PHB), which is an important bioproduct of interest due to its plastic/polyethylene-like properties.
Biologically, PHB serves as an intracellular energy reserve material when cells grow under conditions of nutrient limitation.
Sequence of Polyhydroxyalkanoate Synthase (PhaC):
MATGKGAAASTQEGKSQPFKVTPGPFDPATWLEWSRQWQGTEGNGHAAASGIPGLDALAGVKIAPAQLGDIQQRYMKDFSALWQAMAEGKAEATGPLHDRRFAGDAWRTNLPYRFAAAFYLLNARALTELADAVEADAKTRQRIRFAISQWVDAMSPANFLATNPEAQRLLIESGGESLRAGVRNMMEDLTRGKISQTDESAFEVGRNVAVTEGAVVFENEYFQLLQYKPLTDKVHARPLLMVPPCINKYYILDLQPESSLVRHVVEQGHTVFLVSWRNPDASMAGSTWDDYIEHAAIRAIEVARDISGQDKINVLGFCVGGTIVSTALAVLAARGEHPAASVTLLTTLLDFADTGILDVFVDEGHVQLREATLGGGAGAPCALLRGLELANTFSFLRPNDLVWNYVVDNYLKGNTPVPFDLLFWNGDATNLPGPWYCWYLRHTYLQNELKVPGKLTVCGVPVDLASIDVPTYIYGSREDHIVPWTAAYASTALLANKLRFVLGASGHIAGVINPPAKNKRSHWTNDALPESPQQWLAGAIEHHGSWWPDWTAWLAGQAGAKRAAPANYGNARYRAIEPAPGRYVKAKA
Source: UniProt at https://www.uniprot.org/uniprotkb/P23608/entry#sequences
3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.
reh:H16_A1437 K03821 poly(R)-3-hydroxyalkanoate polymerase subunit PhaC EC:2.3.1.304 | (GenBank) phaC1; Poly(3-hydroxybutyrate) polymerase (N)
atggcgaccggcaaaggcgcggcagcttccacgcaggaaggcaagtcccaaccattcaaggtcacgccggggccattcgatccagccacatggctggaatggtcccgccagtggcagggcactgaaggcaacggccacgcggccgcgtccggcattccgggcctggatgcgctggcaggcgtcaagatcgcgccggcgcagctgggtgatatccagcagcgctacatgaaggacttctcagcgctgtggcaggccatggccgagggcaaggccgaggccaccggtccgctgcacgaccggcgcttcgccggcgacgcatggcgcaccaacctcccatatcgcttcgctgccgcgttctacctgctcaatgcgcgcgccttgaccgagctggccgatgccgtcgaggccgatgccaagacccgccagcgcatccgcttcgcgatctcgcaatgggtcgatgcgatgtcgcccgccaacttccttgccaccaatcccgaggcgcagcgcctgctgatcgagtcgggcggcgaatcgctgcgtgccggcgtgcgcaacatgatggaagacctgacacgcggcaagatctcgcagaccgacgagagcgcgtttgaggtcggccgcaatgtcgcggtgaccgaaggcgccgtggtcttcgagaacgagtacttccagctgttgcagtacaagccgctgaccgacaaggtgcacgcgcgcccgctgctgatggtgccgccgtgcatcaacaagtactacatcctggacctgcagccggagagctcgctggtgcgccatgtggtggagcagggacatacggtgtttctggtgtcgtggcgcaatccggacgccagcatggccggcagcacctgggacgactacatcgagcacgcggccatccgcgccatcgaagtcgcgcgcgacatcagcggccaggacaagatcaacgtgctcggcttctgcgtgggcggcaccattgtctcgaccgcgctggcggtgctggccgcgcgcggcgagcacccggccgccagcgtcacgctgctgaccacgctgctggactttgccgacacgggcatcctcgacgtctttgtcgacgagggccatgtgcagttgcgcgaggccacgctgggcggcggcgccggcgcgccgtgcgcgctgctgcgcggccttgagctggccaataccttctcgttcttgcgcccgaacgacctggtgtggaactacgtggtcgacaactacctgaagggcaacacgccggtgccgttcgacctgctgttctggaacggcgacgccaccaacctgccggggccgtggtactgctggtacctgcgccacacctacctgcagaacgagctcaaggtaccgggcaagctgaccgtgtgcggcgtgccggtggacctggccagcatcgacgtgccgacctatatctacggctcgcgcgaagaccatatcgtgccgtggaccgcggcctatgcctcgaccgcgctgctggcgaacaagctgcgcttcgtgctgggtgcgtcgggccatatcgccggtgtgatcaacccgccggccaagaacaagcgcagccactggactaacgatgcgctgccggagtcgccgcagcaatggctggccggcgccatcgagcatcacggcagctggtggccggactggaccgcatggctggccgggcaggccggcgcgaaacgcgccgcgcccgccaactatggcaatgcgcgctatcgcgcaatcgaacccgcgcctgggcgatacgtcaaagccaaggcatga
Source: KEGG at https://www.genome.jp/dbget-bin/www_bget?reh:H16_A1437
3.3. Codon optimization.
I optimized the phaC coding sequence for E. coli because it is a widely used chassis for recombinant protein expression and for rapid prototyping of metabolic engineering constructs.
I did this using the Benchling tool. I’ve selected the region of the AA sequence I wish to back translate and right clicked on the highlighted region. From the codon optimization tab:
Host: E. coli K-12
Method: Match codon usage
GC content: Medium (0.33 to 0.66) because extreme GC content can create problems. High GC can create strong secondary structures and low GC can cause instability/repeats and can make synthesis harder.
Uridine depletion: off (not relevant for bacterial expression)
Hairpin parameters: Stem size: 8 and Window 50
Restriction sites: avoid BsaI, BsmBI, BbsI (Type IIS enzymes for Golden Gate compatibility since I would have to clone phaA and phaB also, not phaC single gene in one vector)
Patterns to reduce: AAAAAA and ATATATATA
I clicked on “Optimization preview” and got this result:
3.4. You have a sequence! Now what?
PhaC alone will not produce PHB. A minimal PHB pathway typically includes PhaA (β-ketothiolase) and PhaB (acetoacetyl-CoA reductase) in addition to PhaC (PHA synthase). PhaA and PhaB convert central metabolites (via acetyl-CoA) into (R)-3-hydroxybutyryl-CoA, which is the direct substrate that PhaC polymerizes into PHB. You will also need a host capable of supplying sufficient acetyl-CoA and NADPH.
Therefore, for PHB production in E. coli, phaA, phaB, and phaC are commonly co-expressed on the same plasmid (as a single operon with one promoter and RBSs for each gene) and grown under appropriate culture conditions (e.g., carbon excess and nutrient limitation) that favor polymer accumulation.
To produce the protein from DNA, the codon-optimized phaC sequence would be placed in an expression cassette with a promoter, RBS, start codon, coding sequence, stop codon, and terminator. In a cell-dependent system such as E. coli, RNA polymerase transcribes the DNA sequence into mRNA. The ribosome binds the RBS, reads the mRNA codons, and translates them into the PhaC amino-acid chain. For PHB production rather than PhaC expression alone, phaA, phaB, and phaC should be co-expressed so the host can convert acetyl-CoA into (R)-3-hydroxybutyryl-CoA and then polymerize it into PHB.
Part 4: Prepare a Twist DNA Synthesis Order
Project: pBBR1-MSC5::phaCAB
Cell-dependent recombinant expression approach: cloning the codon-optimized phaA, phaB and phaC coding sequences into E. coli K12
The screenshot shows that my Twist account was redirected to “Contact Your Distributor” for orders through Interprise USA Corp., and another page returned an HTTP 500 server error.
Part 5: DNA Read / Write / Edit
5.1 DNA Read
I would sequence DNA used for DNA-based digital data storage because I am interested in how biological molecules can encode digital information. It would be fascinating to recover stored information from DNA as if reading an archive.
I would use Illumina sequencing, a second-generation massively parallel short-read technology, for high-accuracy base calls and reliable decoding of short oligo pools. I would also consider Oxford Nanopore sequencing, a third-generation single-molecule long-read technology, to validate longer constructs and check sequence integrity.
For Illumina, the input would be a pool of synthetic DNA oligos encoding digital data. If the oligos are already short, fragmentation may not be necessary. Library preparation would involve adapter ligation or PCR addition of adapters/indexes, followed by sequencing-by-synthesis using fluorescent reversible terminators. The output would be millions to billions of short reads in FASTQ format with per-base quality scores. The stored data would then be decoded using alignment, consensus generation, and error correction.
5.2 DNA Write
I would synthesize a PHA production cassette for E. coli K-12 containing codon-optimized phaA, phaB, and phaC. The goal would be to rapidly test/study PHB production from a designed pathway rather than cloning each gene manually from genomic DNA.
I would use commercial gene synthesis, such as Twist, because it allows designed DNA sequences to be ordered directly with defined codon usage, avoided restriction sites, and synthesis constraints. The essential steps are: design the coding sequences, codon-optimize them for E. coli, add regulatory parts such as promoter/RBSs/terminator, screen for forbidden restriction sites and problematic repeats, synthesize short oligos, assemble them into longer fragments or a full insert, clone into a plasmid, and verify the final sequence.
The main limitations are length-dependent error accumulation, synthesis difficulty from repeats or extreme GC content, turnaround time, cost for long constructs, and the need for clonal verification before experimental use.
5.3 DNA Edit
Aiming for increased expression of phaCAB and improved PHA production, I would edit E. coli metabolic and stress-tolerance genes to increase PHB yield. For example, I would target pathways that improve acetyl-CoA/NADPH supply, reduce competing carbon sinks, and increase tolerance to intracellular polymer accumulation.
For precise point mutations, I would use CRISPR base editing or prime editing because these methods can introduce targeted sequence changes without relying on double-strand breaks. For larger edits or gene insertions, I would use Cas9-assisted homologous recombination with a donor DNA template.
The design steps would include selecting the target gene, designing guide RNAs, checking off-target risk, preparing the editor plasmid or Cas9/gRNA system, designing the donor template if needed, transforming E. coli, selecting edited colonies, and confirming edits by sequencing.
Limitations include editing efficiency, PAM constraints, off-target edits, toxicity from editor expression, and the increased screening burden when multiplexing several edits.
The artistic design I created using the GUI is available here.
I heavily used the “Example 7 Microbial Earth” by Dominika Wawrzyniak, using pixels loaded from an external resource (a CSV file hosted on my GitHub page).
I used Dominika’s well documented Notion page from HTGAA21 to understand the code and replicate it for my case. I used Gemini assistance only to debug minor typos and syntax errors, and to identify which packages to import to execute the code.
Like Dominika Wawrzyniak, I planned to introduce more colors, like in the image I generated in the Automation Art Interface. However, implementing this design into code turned out to be more difficult and tedious than anticipated, so I left it as one color (red).
I submitted the Python file through the required homework submission form.
As a Committed Listener, I prepared the script and design documentation, but I did not run the protocol on a physical Opentrons robot.
Post-Lab Questions
Question 1
The paper “High-throughput experimentation for discovery of biodegradable polyesters” (Fransen et al., 2023) uses an Opentrons 1st-generation robot to automate a high-throughput biodegradation assay based on the clear-zone technique.
The researchers synthesized 642 polyesters and polycarbonates and tested their biodegradability using a clear-zone assay with Pseudomonas lemoignei. The Opentrons robot was repurposed as an automated imaging platform to capture time-lapse images of polymer degradation in 12-well plates, enabling consistent, large-scale monitoring over 13 days.
This automation allowed rapid generation of a large biodegradation dataset and supported machine learning models to predict polymer degradability from chemical structure.
Question 2
High-throughput screening of bacterial isolates for PHA production is traditionally extremely time-consuming and labor-intensive, requiring manual handling of hundreds of colonies across multiple conditions. For my final project, I plan to use an Opentrons OT-2 liquid-handling robot to automate this workflow, dramatically increasing throughput, reproducibility, and consistency compared to manual methods I used during my master’s.
Isolates will be spotted in triplicate on 60-sector plates, maintaining identical indexed positions across all plates for direct comparison. Viability will first be confirmed on LB agar, and isolates will then be inoculated onto mineral medium (MM; Ramsay et al., 1990) agar plates supplemented with individual carbon sources at 10% v/v to reach typical screening concentrations.
PHA production and bacterial growth will be assessed using a two-step staining workflow. First, Sudan Black B (0.02% in 96% ethanol, followed by ethanol washes) will identify colonies with blue coloration indicative of polymer accumulation. Second, Nile Red A incorporated into MM (0.5 μg/mL) will allow selected isolates to be ranked based on UV fluorescence (312/365 nm).
This automated setup enables rapid testing of hundreds of isolate × carbon source combinations, accelerating the discovery of strains compatible with low-cost feedstocks and efficient bioprocessing while transforming a laborious manual process into a precise, scalable screening platform.
Each “color” would correspond to a different bacterial isolate. I did not implement this in the script yet. The coordinate set is a starting layout and could be refined to achieve a more uniform, regular distribution across the plate (like in the image I drafted using the GUI available below)
Final Project Ideas
Added 3 slides with 3 ideas for an Individual Final Project in the appropriate slide deck for Commited Listeners here.
Also here’s my analoginal brainstorm
Week 4 HW: Protein Design Part 1
Homework: Protein Design I
Part A. Conceptual Questions
1) How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
~ 21% of meat is protein content (Smith et al. 2022) therefore, 500g meet contains about 105g of protein.
Using the approximation of average amino acid ≈ 100 Da ≈ 100 g/mol for ~100 g protein: 100/100=1.00 mol
2) Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Beef/fish supplies raw materials and energy, but it doesn’t transfer “cow/fish identity”. What we eat is digested first meaning the proteins, fats, and carbohydrates are broken down into small building blocks (amino acids, fatty acids, sugars), absorbed, and then reassembled into human molecules under human genetic and hormonal control.
3) Why are there only 20 natural amino acids?
Doig (2017) hypothesizes that the canonical set of 20 standard amino acids is best understood as an evolved “alphabet” that became fixed early because this set is sufficient and practical for building stable, soluble proteins. This set enables soluble folded structures with close-packed hydrophobic cores and ordered binding pockets, rather than being selected because each amino acid was needed for catalysis (since RNA catalysts were already effective enough). Once early life standardized a working translation system around this set, changing the alphabet would have been costly, so it became effectively locked in (“frozen”). Other references, such as Freeland et al. (2000), suggest that 20 is a good number for minimizing damage from errors (mutation/mistranslation).
4) Where did amino acids come from before enzymes that make them, and before life started?
Amino acids could plausibly have come from abiotic chemistry on early Earth. Proposed routes include cyanosulfidic protometabolism and amino-acid formation from electrical discharges in simple “primitive Earth” gas mixtures (the classic Miller experiment).
5) Can you discover additional helices in proteins?
Beyond the α-helix, proteins commonly contain 3₁₀ helices and π helices (less frequent helical variants), as well as polyproline II helices (common in Pro-rich/disordered regions) and the specialized collagen triple helix.
6) Why are most molecular helices right-handed?
Right-handed helices dominate because natural biomolecules are made from single-handed monomers, and the right-handed twist is the lowest-energy way to repeat their geometry without clashes.
7) Why do β-sheets tend to aggregate?
β-sheet aggregation buries exposed hydrophobic side chains and releases ordered water from their surfaces, which is strongly favorable, lowering enthalpy.
8) What is the driving force for β-sheet aggregation?
β-sheet aggregation is driven mainly by the hydrophobic effect and stabilized/propagated by intermolecular backbone H-bonding in the cross-β structure (often reinforced by tight steric-zipper packing).
9) Why do many amyloid diseases form β-sheets?
β-sheet architecture is an unusually generic, stable, and self-templating way for polypeptide backbones to stick together when normal folding fails. In a β-sheet, the peptide backbone forms regular hydrogen bonds. This conformation makes amyloid fibrils thermodynamically stable and hard to clear, because once a small β-sheet nucleus forms, it can seed further growth by recruiting more monomers and templating the same β-rich structure.
Part B: Protein Analysis and Visualization
Question 1
I selected poly(3-hydroxyalkanoate) depolymerase (PhaZ) because it is the key enzyme that degrades PHB, which directly controls whether a microbe accumulates bioplastic (useful for biotechnology) or breaks it down (relevant for environmental fate). phaZ inactivation is commonly discussed as a strategy to reduce PHA mobilization and increase polymer retention.
BLAST Result
Lenght: 283 aa
Most frequent amino acid: Leucine (L), 32/283 = 11.3%
250 hits
Reviewed (Swiss-Prot) homologs: 1
It belongs to the PHA depolymerase (PhaZ) family, which is part of the broader α/β-hydrolase enzyme superfamily.
Question 3
AF_AFP26495F1 - COMPUTED STRUCTURE MODEL OF POLY(3-HYDROXYALKANOATE) DEPOLYMERASE
This is not an experimentally solved structure, so there is no X-ray/EM “resolution” value. RCSB explicitly states: “There are no experimental data to verify the accuracy of this computed structure model. See Model Confidence metrics below for all regions of the polypeptide chain.” Instead, quality is reported by AlphaFold confidence. Global pLDDT: 91.95 (very high confidence overall)
RCSB lists 1 unique protein chain (monomer A1) and no ligands/non-protein entities.
Structure classification family: InterPro annotations classify it as Poly(3-hydroxyalkanoate) depolymerase (IPR011942) and an alpha/beta hydrolase fold protein (Alpha/beta hydrolase fold-1 domain, AB hydrolase superfamily).
Question 4
I opened AF-Q9R9W3-F1-model_v6 in PyMOL and visualized it in cartoon, ribbon, and ball-and-stick representations.
Colored by secondary structure, it shows a mixed α/β fold with more helices than β-sheets.
Colored by residue type, hydrophobic residues are enriched in the core (and in a few surface patches), while polar/charged residues are mostly surface-exposed, consistent with solubility.
The surface view shows clear cavities/clefts, consistent with potential binding pockets (e.g., a substrate-binding groove typical of hydrolases).
Part C. Using ML-Based Protein Design Tools
For this section, I chose PDB 6J2U as a structural reference. This entry contains a heterodimeric complex between MelC1, the tyrosinase caddy/cofactor protein, and MelC2, the tyrosinase enzyme from Streptomyces avermitilis. For my analysis, I focused on the MelC2 tyrosinase chain (6J2U_2: Represented by Chain B).
a) I used the Chain B sequence from PDB 6J2U, including the N-terminal expression tag present in the deposited sequence.
b) The vertical darker columns at certain positions are highly constrained residues where most substitutions are penalized. That usually indicates structural importance (core packing, tight turns, or residues critical for fold stability). Positions with mostly neutral colors across many substitutions are likely surface-exposed or in flexible loops, where the model predicts more tolerance.
After generating the ESM2 mutational scan heatmap, I found it difficult to confidently interpret specific patterns only by visual inspection, because the plot contains many residues and mutations compressed into a dense matrix. To make the interpretation more objective, I used ChatGPT to help me write small analysis snippets to quantify the heatmap directly. I run a script to calculate the average ESM2 score for each mutant amino acid across all positions and found out that substitutions to cysteine are broadly disfavored across the MelC2 sequence.
In fact, in the heatmap, the cysteine row apparently shows many strongly negative scores, suggesting that the model predicts cysteine mutations to be poorly compatible with this protein. This makes biological sense because cysteine can introduce reactive thiol chemistry, unwanted disulfide-like interactions, or local structural constraints that may disrupt folding or stability, especially in a soluble bacterial enzyme where cysteine is not broadly used as a tolerated replacement.
Question 2
During the latent space analysis, I tried to use the provided SCOPe/Astral sequence dataset from the notebook, but I could not load it correctly in Colab. When I attempted to display sequences, I got an IndexError: list index out of range, which indicated that no sequence records had been parsed.
At first, I tested whether the issue was caused by comment lines before the first FASTA entry and tried using the fasta-pearson parser. After further debugging with Gemini/AI assistance in Colab, the issue appeared to be that the dataset URL was not returning the expected FASTA file, but HTML content instead.
I also tried opening the SCOPe/Astral page manually in the browser, but the site displayed an anti-bot verification page and did not provide access to the dataset.
Because of this, Biopython could not parse the dataset, so I was not able to generate the reduced-dimensionality map or place my protein in it.
If the dataset had loaded correctly, my workflow would have been:
Parse the SCOPe/Astral FASTA dataset.
Add the MelC2 Chain B sequence to the dataset.
Generate ESM2 embeddings for all sequences.
Reduce the embeddings using t-SNE.
Highlight MelC2 in the resulting map.
Compare MelC2 to its nearest neighbors.
a) Use the provided sequence dataset to embed proteins in reduced dimensionality I attempted to use the provided SCOPe/Astral sequence dataset, but the file could not be accessed correctly. The downloaded content was HTML rather than a valid FASTA file, so I could not generate ESM2 embeddings from the provided dataset.
b) Analyze the different formed neighborhoods: do they approximate similar proteins? Since the dataset could not be parsed, I could not generate or analyze the embedding neighborhoods directly. Conceptually, I would expect ESM2 embeddings to place proteins with related sequence-level features, domains, motifs, or families closer together, but I could not verify this with the provided dataset.
c) Place your protein in the resulting map and explain its position and similarity to its neighbors My plan was to add MelC2 tyrosinase from PDB 6J2U Chain B to the dataset before embedding, then inspect whether it clustered near related proteins such as oxidoreductases, tyrosinases, or metal-binding enzymes. Since the dataset could not be accessed correctly, I could not place MelC2 in the final map, so this remains a planned analysis rather than a completed result.
C2. Protein Folding
Question 1
I folded the MelC2 tyrosinase Chain B sequence from PDB 6J2U using ESMFold. The input sequence was 285 amino acids long. The prediction completed successfully with pTM = 0.906 and average pLDDT = 86.743, suggesting that ESMFold produced a high-confidence global fold for MelC2.
At the fold level, yes: the ESMFold prediction appears broadly consistent with the original MelC2 structure. However, I did not calculate RMSD, and the original PDB structure includes MelC2 in complex with MelC1 and metal ions, while ESMFold predicts from sequence alone. Therefore, I interpret the result as a strong qualitative fold-level match, not a precise coordinate-level comparison.
To test whether the MelC2 predicted structure is resilient to a small sequence change, I first introduced a single point mutation into the Chain B sequence.
I used the following simple Python function to generate the mutant sequence here.
I selected position 100, where the native residue is serine (S), and mutated it to cysteine (C):
I then used this S100C MelC2 mutant sequence as the input for a new ESMFold prediction, so I could compare its predicted fold and confidence scores with the native MelC2 prediction.
The S100C mutant produced almost the same ESMFold confidence scores as the native sequence.
Length: 285
ptm: 0.906
plddt: 86.874
After introducing the S100C mutation, the predicted structure still appeared compact and globular, with no obvious large-scale disruption compared to the native model. This suggests that MelC2 is structurally resilient to this single substitution at the overall fold level.
Mutant 2
I generated this 16-amino-acid segment-level mutant using a short Python script suggested by ChatGPT here. The script replaced residues 120-135 of the native MelC2 sequence with a glycine-rich segment while preserving the original protein length. I used this to test whether the predicted MelC2 fold is resilient to larger local sequence disruption.
The segment mutant produced a lower-confidence ESMFold prediction than the native and S100C sequences. The native MelC2 model had pTM = 0.906 and pLDDT = 86.743, while the segment mutant dropped to pTM = 0.865 and pLDDT = 81.386.
Visually, the predicted structure still formed a compact globular fold, so the protein did not appear completely disrupted. However, the decrease in both pTM and pLDDT suggests that replacing residues 120-135 with glycines weakened the model’s confidence in the fold.
This makes sense because a glycine-rich replacement can increase flexibility and remove side-chain interactions that may help stabilize the local structure. Still, these are structure predictions only, experimental testing would be needed to know whether catalytic activity or copper/metal-related function is preserved.
Fold still predicted, but confidence decreased, suggesting the perturbation affected structural stability more than the point mutation
C3. Protein Generation
Question 1
I used ProteinMPNN to redesign the MelC2 chain from PDB 6J2U. I set Chain B as the designed chain and kept Chain A fixed, since Chain B is MelC2 tyrosinase and Chain A is the MelC1 caddy/cofactor protein.
ProteinMPNN used 273 resolved residues from Chain B and generated a redesigned sequence with:
Native score: 1.2305
Designed score: 0.7427
Sequence recovery: 0.5751
The sequence recovery means that about 57.5% of the redesigned residues matched the native MelC2 sequence. This suggests that ProteinMPNN found a sequence predicted to fit the same backbone while changing a substantial part of the original sequence.
However, this only suggests structural compatibility. It does not prove that the redesigned protein would preserve tyrosinase activity, metal binding, or melanin production.
Question 2
I folded the ProteinMPNN-designed MelC2 sequence with ESMFold to test whether the redesigned sequence still predicts a MelC2-like structure.
Sequence
Length
pTM
pLDDT
Interpretation
Native MelC2
285 aa
0.906
86.743
High-confidence native fold prediction
ProteinMPNN design
273 aa
0.878
80.444
Still folds with good confidence, but lower than native
The ProteinMPNN-designed sequence produced pTM = 0.878 and pLDDT = 80.444. These scores are lower than the native MelC2 prediction, but still reasonably high, suggesting that the redesigned sequence remains structurally compatible with the MelC2 backbone.
Because the designed sequence had only 57.5% sequence recovery, it is substantially different from native MelC2. However, ESMFold still predicted a compact fold with good confidence. This suggests that ProteinMPNN generated a sequence that may preserve the overall structure, although this does not prove preservation of tyrosinase activity, metal binding, or melanin production.
Final Conclusions
Sequence / model
Type of test
Change introduced
Length
pTM
avg pLDDT
Result / interpretation
Native MelC2
Baseline ESMFold prediction
Original MelC2 Chain B sequence from PDB 6J2U
285 aa
0.906
86.743
High-confidence compact fold. Used as the reference for comparison.
S100C mutant
Point mutation
Serine at position 100 replaced by cysteine
285 aa
0.906
86.874
Scores were essentially unchanged compared with native MelC2. The global fold appears resilient to this single point mutation.
Segment mutant 120-135 Gly
Large local perturbation
Residues 120-135, RSLDGRVMDGPFAAST, replaced with 16 glycines
285 aa
0.865
81.386
Still predicted to fold, but with reduced confidence. This suggests the global fold is not destroyed, but the perturbation affects structural confidence/stability more than the point mutation.
ProteinMPNN-designed MelC2
Inverse-folding design + ESMFold validation
ProteinMPNN redesigned Chain B using the 6J2U backbone; sequence recovery = 0.5751
273 aa
0.878
80.444
Still predicted to fold with reasonably good confidence, despite only ~57.5% sequence recovery. Suggests the backbone can support substantial sequence variation, but function is not guaranteed.
Overall, MelC2 appears structurally robust at the global fold level. However, all of these conclusions are structural predictions. A preserved fold does not prove preserved tyrosinase activity, copper/metal binding, or melanin production. Functional validation would still require experimental testing.
Part D. Group Brainstorm on Bacteriophage Engineering
My group and I are conducting research for the group phage project. We have set up a shared Google Docs (screenshot below).
Phage reading material
We reviewed the Week 4 phage reading material and used it to focus the proposal on the MS2 L protein, especially its stability, DnaJ dependence, membrane insertion, and lysis function.
From the proposed bacteriophage engineering goals, our group focused on: Increased stability of the L protein
Our short group plan was to use computational protein design tools to identify mutations that could improve the stability of the MS2 L protein. One possible direction was to make the L protein less dependent on the bacterial chaperone DnaJ by identifying mutations that could improve folding, membrane insertion, or oligomerization.
We proposed using:
Protein language model mutational scoring
In silico mutagenesis
Experimental L-protein mutant data
Biological reasoning based on known L-protein functional regions
These tools can help prioritize mutations before experimental testing. Protein language model scores can identify substitutions that are sequence-compatible, while experimental mutant data and biological reasoning can help filter candidates based on possible effects on DnaJ dependence, membrane behavior, and lysis function.
Potential pitfalls: One pitfall is that positive LLR scores may reflect sequence plausibility, but not necessarily improved lysis function. A second pitfall is that increasing protein stability may not always improve function, because L-protein activity may require flexibility, membrane disruption, or host-factor interaction.
Pipeline schematic
MS2 L-protein sequence: mutational scoring notebook → shortlist positive-scoring substitutions → compare with experimental L-protein mutant data → map candidates to functional regions → select mutations for future experimental testing
Individual plan / contribution
My individual contribution was to select candidate MS2 L-protein mutations by combining LLR scores, experimental mutant data, and biological reasoning.
I selected two soluble-region mutants, S9Q and C29R, to probe folding and possible DnaJ dependence. I also selected three transmembrane-region mutants, A45L, T52L, and N53L, to probe membrane insertion and oligomerization.
Mutant
Region
LLR
Rationale
S9Q
Soluble / N-terminal
2.014
May affect folding or DnaJ-related surface chemistry
C29R
Soluble / N-terminal
2.395
Strong positive score; may alter chaperone-recognition surfaces
A45L
Transmembrane
1.539
May increase hydrophobic packing and membrane stability
T52L
Transmembrane
1.814
Polar-to-hydrophobic change that may improve membrane compatibility
N53L
Transmembrane
1.865
Additional transmembrane-stabilizing candidate
Use of AI assistance
I used ChatGPT as a writing and organization assistant to help structure this section and make sure the required items were clearly addressed. I reviewed, edited, and finalized the scientific content myself.
Week 5 HW: Protein Design Part 2
Part A: SOD1 Binder Peptide Design (From Pranam)
Part 1: Generate Binders with PepMLM
Question 1
This is human SOD1 sequence from UniProt (P00441) removing the initial Met
ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
FASTA
introducing the A4V mutant associated with the most aggressive forms of the ALS disease
ATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
Question 2 and 3
With the help of ChatGPT and Gemni, I generated 2 new cells ir order to generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence.
Interpretation: The perplexity score is PepMLM’s confidence in the peptide under its generative model. PepMLM perplexity can be interpreted this way: lower = higher confidence
PepMLM assigns higher confidence to the four generated peptides than to the known binder under this scoring scheme, with WRYGAAAVEWKE ranked best (lowest perplexity).
The known binder has higher perplexity, suggesting it is less consistent with PepMLM’s learned binder distribution for this target, even though it is experimentally reported to bind. This highlights that PepMLM perplexity is not an experimental binding score. Also, it suggests that perplexity alone is insufficient to validate binding.
As I found this really strange, I decided to find out checks I could run to see whether this was an error/artifact:
Conclusion
My generated peptides are enriched in W/V/A/Y and look like classic short hydrophobic binders. The known binder FLYRWLPSRRGG has a highly charged tail (RRGG) and a different composition pattern, which the model may assign low probability to even if it binds in reality.
At first, I mistakenly evaluated all peptides in the same run.
Then I noticed the AlphaFold Server treated that as one multi-chain complex with 6 chains total (SOD1 + 4 generated peptides + the known binder). So to compare them I would had to run 5 separate jobs.
SOD1 + HRVPVAGVEWWE: ipTM = 0.34; pTM = 0.86
Where the peptide appears to bind?
The peptide is positioned along an external surface of the SOD1 β-strand core, contacting a β-sheet edge/adjacent loop (surface-bound).
SOD1 + WSYYVTAVAHKE: ipTM = 0.22; pTM = 0.81
Where the peptide appears to bind?
The peptide shows weak localization and appears loosely associated with the protein surface, without a clearly defined contact region.
SOD1 + WRYGAAAVEWKE: ipTM = 0.41; pTM = 0.85
Where the peptide appears to bind?
The peptide is placed near a β-barrel edge/loop region on the outer surface of SOD1 (surface-bound).
SOD1 + WSVPVVAIEHGE: ipTM = 0.44; pTM = 0.86
Where the peptide appears to bind?
The peptide is positioned on a distinct surface patch on the β-barrel face/edge, appearing more localized than the others (surface-bound).
Where the peptide appears to bind?
The peptide contacts the protein surface and appears partially inserted into a shallow surface groove/cleft (partially buried relative to the others).
The observed ipTM values are uniformly low (0.22–0.44), indicating limited AlphaFold3 confidence in any specific peptide–SOD1 interface. Among the PepMLM-generated candidates, WSVPVVAIEHGE (ipTM = 0.44) and WRYGAAAVEWKE (ipTM = 0.41) score higher than the known binder FLYRWLPSRRGG (ipTM = 0.30), while HRVPVAGVEWWE (0.34) is slightly higher and WSYYVTAVAHKE (0.22) is lower. Overall, PepMLM-generated peptides match or exceed the known binder by ipTM, but the absolute scores suggest weakly supported, mostly surface-associated binding modes rather than a high-confidence complex.
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
HRVPVAGVEWWE
WSYYVTAVAHKE
WRYGAAAVEWKE
WSVPVVAIEHGE
FLYRWLPSRRGG (control)
Across all five peptides, PeptiVerse predicts solubility = 1.000 and non-hemolytic behavior (hemolysis probabilities 0.035–0.064), so none of the candidates are flagged as poorly soluble or strongly hemolytic. Predicted binding affinities (pKd/pKi) vary and do not track ipTM: the highest-ipTM peptide (WSVPVVAIEHGE, ipTM 0.44) has the lowest predicted affinity (5.338), while WRYGAAAVEWKE has a higher predicted affinity (6.526) but slightly lower ipTM (0.41).
The known binder (FLYRWLPSRRGG) shows mid-range predicted affinity (5.962) and ipTM (0.30). Considering binding prediction plus safety-like properties, WRYGAAAVEWKE best balances the set: it has the highest predicted affinity (6.526), is predicted soluble (1.000), and has low hemolysis probability (0.047), while still achieving a relatively higher ipTM (0.41) compared to most others.
Peptide to advance: WRYGAAAVEWKE - it is predicted to be soluble, low-hemolysis, and has the strongest predicted binding affinity among the tested peptides, with moderate (though still low-confidence) structural support from AlphaFold3 (ipTM 0.41).
Part 4: Generate Optimized Peptides with moPPIt
I used the moPPIt Colab on a GPU runtime and pasted the A4V mutant SOD1 sequence (mature form without initiator Met). Here’s my collab copy.
I set binder length to 12 aa and generated a pool of candidate peptides using multi-objective guidance. I enabled affinity guidance and included solubility and hemolysis guidance to bias toward more developable peptides.
Binder (12-aa)
Solubility
Half-life
Affinity
EWWRERLRQTLI
0.5833
0.5833
6.0163
EDWLATLRAATS
0.5000
5.9279
5.7517
EEEWRQLQSQYE
0.8333
4.4313
6.8902
TEEEGVRWKRGV
0.7500
4.0548
6.4628
ELLQWILGITIE
0.4167
13.4681
6.1644
Compared to PepMLM, moPPIt produces peptides shaped by explicit objectives. PepMLM peptides were more diverse but less controlled with respect to developability properties whereas moPPIt candidates tend to show stronger biases in composition, more consistent physicochemical properties across candidates, and often a narrower “design family” reflecting the guidance constraints. On this run, the moPPIt outputs are more compositionally biased toward charged residues (E/D and R/K), consistent with explicit optimization for solubility and half-life alongside affinity. Here’s a summary interpretation of the results:
Best predicted affinity: EEEWRQLQSQYE (6.8902)
Best predicted solubility: EEEWRQLQSQYE (0.8333)
Best predicted half-life: ELLQWILGITIE (13.4681)
Most “balanced” if you prioritize binding + solubility: EEEWRQLQSQYE (top on both, but not top half-life)
Most “balanced” if you prioritize half-life strongly: ELLQWILGITIE (best half-life, but lowest solubility)
Before any clinical consideration, I would follow a staged evaluation: (1) in silico screening for interface plausibility (AlphaFold3 ipTM/PAE consistency across seeds) plus basic developability predictions (solubility, hemolysis, aggregation risk); (2) in vitro binding assays (SPR/BLI or competition ELISA), stability in serum, and cytotoxicity/hemolysis assays; (3) cell-based assays for functional effect and off-target toxicity; (4) only after robust preclinical evidence, proceed to in vivo PK/PD and safety studies. In other words, moPPIt designs are hypotheses that must be filtered by structural consistency and validated experimentally before any translational claims.
Part B: BRD4 Drug Discovery Platform Tutorial (Gabriele)
Since this was an optional part, I decided to skip for now.
Part C: Final Project: L-Protein Mutants
Phage Lysis Protein Design Challenge
L-Protein Engineering | Option 1: Mutagenesis
I ran the mutational scoring notebook to obtain per-substitution LLR scores and shortlisted mutations with positive scores.
Position
Wild_Type_AA
Mutation_AA
LLR_Score
50
K
L
2.561464
29
C
R
2.395425
39
Y
L
2.241777
29
C
S
2.043149
9
S
Q
2.014323
29
C
Q
1.997047
29
C
P
1.971026
29
C
L
1.960644
50
K
I
1.928798
53
N
L
1.864930
61
E
L
1.818097
52
T
L
1.813966
50
K
F
1.802066
29
C
T
1.797245
29
C
K
1.795876
5
F
Q
1.795244
5
F
R
1.659717
29
C
A
1.648654
27
Y
R
1.628060
22
F
R
1.602028
5
F
P
1.596888
50
K
V
1.594572
50
K
S
1.574555
5
F
T
1.559023
5
F
S
1.556416
45
A
L
1.539248
39
Y
S
1.517457
27
Y
S
1.497052
40
V
L
1.477630
27
Y
L
1.474637
I then intended to cross-check each shortlisted mutation against the experimental mutant dataset (L-Protein Mutants) to see whether the experimental lysis phenotype is directionally consistent with the LLR score.
Only 6 substitutions from my scored shortlist overlapped with the experimental table (C29R, C29S, K50I, K50S, Y27S, Y39S). In the experimental dataset, all overlapping substitutions were labeled as non-lytic (Lysis = 0) despite having positive LLR scores in the notebook. This suggests that, for MS2 L-protein, sequence-only language-model scores may not reliably capture key determinants of lysis (likely influenced by membrane insertion, oligomerization, and host-factor dependence). We therefore should treat LLR scores as a hypothesis generator, not a predictor of functional lysis.
I selected five single-point variants, including two mutations in the soluble region (positions 1–40) and three in the transmembrane region (TM) (positions 41–75), as required.
I selected five single substitutions with positive LLR scores. I enforced the assignment constraint by choosing two mutations in the soluble region (positions 1–40) and three in the transmembrane region (positions 41–75).
Here are the 5 mutants I choose:
Mutant 1 - S9Q (soluble, LLR = 2.014)
Sequence:
METRFPQQQQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
Rationale: High positive score in the soluble region (putative DnaJ-interaction domain). Ser→Gln increases hydrogen-bonding potential and may alter surface chemistry without strongly destabilizing the fold.
Mutant 2 - C29R (soluble, LLR = 2.395)
Sequence:
METRFPQQSQQTPASTNRRRPFKHEDYPRRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
Rationale: One of the strongest positive-scoring substitutions in the soluble region. Adds a positive charge that could reshape chaperone-recognition or interaction surfaces.
Mutant 3 - A45L (TM, LLR = 1.539)
Sequence:
METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLLIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
Rationale: Hydrophobic substitution in the transmembrane segment. Ala→Leu increases hydrophobicity and may stabilize membrane helix packing/insertion and oligomer stability.
Mutant 4 - T52L (TM, LLR = 1.814)
Sequence:
METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFLNQLLLSLLEAVIRTVTTLQQLLT
Rationale: Polar→hydrophobic change in the TM region. Thr→Leu may increase membrane compatibility and reduce local insertion/misfolding penalties.
Mutant 5 - N53L (TM, LLR = 1.865)
Sequence:
METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTLQLLLSLLEAVIRTVTTLQQLLT
Rationale: Polar→hydrophobic change in the TM region with a strong positive score. Selected as an additional TM-stabilizing candidate.
Week 6 HW: Genetic Circuits Part 1: Assembly Technologies
Assignment: DNA Assembly
Question 1: What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Phusion High-Fidelity PCR Master Mix is a 2X, ready-to-use mixture where the exact formulation is partly proprietary, but the functional components are documented in the manufacturer’s manual:
Component (Phusion 2X Master Mix)
Purpose
Phusion High-Fidelity DNA Polymerase
DNA synthesis with high fidelity + proofreading
dNTPs (dATP, dCTP, dGTP, dTTP)
Building blocks for new DNA strands
HF reaction buffer (salts + pH buffer)
Maintains optimal pH/ionic strength for enzyme function
Mg2+ (via buffer system; often MgCl2-derived)
Essential polymerase cofactor
Stabilizers / additives (partly proprietary)
Improve enzyme stability and consistency
Nuclease-free water
Solvent to reach correct 2X working concentrations
Higher Ta increases stringency, reduces non-specific binding
Question 3: There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
Aspect / Decision point
PCR (amplification)
Restriction enzyme (cutting)
What it does
Amplifies a defined region between two primers
Cuts existing DNA at specific recognition sites
Input
Template DNA + primers
DNA substrate (plasmid/PCR product/genomic DNA) + restriction enzyme(s)
Key reagents
Polymerase mix, primers, dNTPs, buffer, Mg2+
Restriction enzyme(s), buffer, often BSA (enzyme-dependent)
Protocol core steps
Denature → anneal → extend (cycling)
Incubate DNA with enzyme(s) at recommended temperature/time
Sequence requirements
Need primer-binding sites flanking target
Need the enzyme recognition site(s) present in the DNA
Output fragment boundaries
Defined by primer positions (base-precise)
Defined by cut sites (exact where enzyme cleaves)
Can create new sequences?
Yes - primers can add overhangs/tags/sites
No - only cuts at existing sites (unless sites were engineered earlier)
Typical use cases
Generate a specific insert, add adapters, site-directed changes, amplify from low-abundance template
Linearize a plasmid, excise an insert, diagnostic mapping, generate compatible ends for cloning
Speed / setup
Moderate - requires optimization (Ta, primers)
Fast/simple if sites exist and enzyme conditions are known
Failure modes
Non-specific bands, primer-dimers, no amplification, PCR errors
Star activity (wrong cuts), incomplete digestion, missing sites
Fidelity / errors
Depends on polymerase; can introduce mutations
No replication - does not introduce point mutations
When preferable
When you need a specific fragment and/or to add features (overhangs, tags), or template amount is low
When the fragment is already present and flanked by useful sites; when you need clean linearization/excision without amplification
Question 4: How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
Check / requirement
What to do (PCR + digest)
Why it matters for Gibson
20–40 bp overlaps at every junction
Design primers so each fragment end has 20–40 bp homology to the adjacent fragment/backbone
Gibson assembly depends on annealing of complementary overlaps
Correct orientation of overlaps
Ensure the overlap sequence matches the correct neighbor (A→B, B→C, insert→vector, etc.)
Wrong overlap = wrong assembly or no assembly
Linearized backbone
Restriction-digest the vector to a single linear band; gel-purify if needed
Gibson requires a linear backbone (no undigested circular plasmid carryover)
Remove template plasmid from PCR
If PCR was from plasmid, treat with DpnI (cuts methylated template)
Prevents parental plasmid background colonies
Clean fragment ends (no inhibitors)
Purify PCR and digest products (spin column or gel extraction)
Salts, ethanol, detergents inhibit Gibson enzymes
Correct fragment sizes
Run an agarose gel to confirm expected sizes; excise/gel-purify correct bands if mixed
Verifies you’re assembling the intended pieces
Avoid duplicate/competing overlaps
Keep overlaps unique (no repeated identical overlap sequences across multiple junctions)
Prevents mis-assembly and rearrangements
Overlap doesn’t create strong hairpins/repeats
Check overlap sequences for high secondary structure/repeats
Improves annealing and reduces drop in assembly efficiency
Balanced fragment concentrations
Quantify DNA (Nanodrop/Qubit) and use equimolar amounts; keep total DNA in recommended range
Too much/too little of one piece reduces correct assembly
No internal cuts from chosen restriction enzymes
Verify your insert/parts don’t contain the restriction sites used to linearize the vector
Prevents unintended fragmentation or loss of insert
Question 5: How does the plasmid DNA enter the E. coli cells during transformation?
The plasmid DNA enter the E. coli cells during transformation through transient permeability of the cell envelope. This can happen either via:
Electroporation: a short electric pulse creates temporary membrane pores that let DNA pass into the cytoplasm.
Chemical (heat-shock) transformation: divalent cations (e.g., Ca²⁺) reduce electrostatic repulsion between DNA and the membrane, and a brief heat shock promotes DNA uptake through temporary pores/defects.
Question 6: Describe another assembly method in detail (such as Golden Gate Assembly)
a) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online).
Golden Gate Assembly is a molecular cloning technique that allows multiple DNA fragments to be assembled simultaneously in a single reaction. It uses Type IIS restriction enzymes such as BsaI, BsmBI, or BbsI, which cut DNA outside their recognition sequence and generate custom sticky ends. We can control the order and orientation in which DNA fragments assemble by placing Type IIS restriction sites around each fragment and designing specific 4-bp overhangs that are complementary only to the intended neighboring fragment, the order and orientation of DNA assembly are precisely controlled.. During the reaction, the restriction enzyme digests the DNA fragments while T4 DNA ligase simultaneously ligates matching overhangs in the same tube, making the process efficient and rapid. Because the restriction sites are removed during assembly, the correctly assembled construct cannot be cut again, while incorrect products continue to be digested, driving the reaction toward the desired product. The reaction is typically performed in a thermocycler alternating between ~37 °C (optimal for digestion) and ~16 °C (optimal for ligation). This method is widely used in synthetic biology because it enables scarless assembly of many DNA parts, although internal Type IIS restriction sites must first be removed usually by silent mutation(s).
Golden Gate Assembly – Step-by-Step Diagram
Step 1: Design fragments with Type IIS sites
Vector: [BsaI]─────────────[BsaI]
Fragment A: [BsaI]──Part A──[BsaI]
Fragment B: [BsaI]──Part B──[BsaI]
Inward-facing BsaI sites. Overhangs are designed to match the next fragment.
Step 2: Type IIS cuts outside recognition sites
Vector: GCTT—–
Fragment A: —–AATG (overhang)
Fragment B: AATG—–GCTT (overhangs)
Recognition sites (BsaI) are removed on small excised pieces.
Step 3: Annealing of fragments
Vector —–GCTT
Fragment A GCTT—–AATG
Fragment B AATG—–CGAA
Overhangs anneal only to the correct partner. Orientation is fixed.
Step 4: Ligase seals fragments
Final construct:
Vector ── Fragment A ── Fragment B
Scarless assembly. BsaI sites are gone, so the construct is stable.
Step 5: Reaction drives correct assembly
Misassembled fragments still have exposed BsaI sites → cut again
Correct product accumulates over multiple cycles
Key Points:
Modular → promoters, RBS, genes, terminators
Multi-fragment assembly in one tube
Order & orientation controlled by 4-bp overhangs
Scarless final product
b) Model this assembly method with Benchling or a similar tool!
I imported the pBBR1MCS-5 sequence as circular DNA (pBBR1MCS-5 (raw)) and imported phaA, phaB, phaC as separate linear DNA sequences.
I checked for internal BsaI sites (GGTCTC) in all sequences: the genes have no BsaI sites, and pBBR1MCS-5 has a single BsaI site, so it is not a Golden Gate destination vector by direct digest. To model Golden Gate anyway, I created a PCR-linearized Golden Gate backbone: I duplicated the plasmid and saved a linear version (pBBR1MCS-5_GG_backbone).
On this linear backbone, I created two endpoint annotations (first ~20 bp and last ~20 bp) to represent that PCR primers would add inward-facing BsaI sites + 4 bp overhangs:
start: BsaI + Overhang OH1 (added by PCR primer)
end: BsaI + Overhang OH4 (added by PCR primer)
To simplify the Benchling model, I represented Golden Gate flanks (inward-facing BsaI sites and 4-bp overhangs) as annotations rather than explicitly adding the flanking sequences. In a real build, these flanks would be introduced via PCR primers or synthesis.
I duplicated each gene to create Golden Gate-ready parts (phaA (codon optimized) anotated, phaB (codon optimized) anotated and phaC (codon optimized) anotated) and defined the assembly overhang scheme for directional order. For each gene, I added annotations with intended Golden Gate junction overhangs:
Left end: Intended Golden Gate overhang: OH1 (conceptual)
Right end: Intended Golden Gate overhang: OH2 (conceptual)
Overhangs were not added as literal sequences, I only annotated the first/last 20 bp to indicate where BsaI-generated 4 bp overhangs would be introduced via primers/synthesis.
For a simplified Golden Gate model in Benchling, I manually constructed the final plasmid sequence by opening pBBR1MCS-5 at the MCS and concatenating the backbone with phaA–phaB–phaC in the intended order. Overhangs/Type IIS flanks were represented as annotations only.
Assignment: Asimov Kernel
Asimov Kernel notes / all material on my repo “Kanbe-Mariana-HW6”. Below are just some of the info, but please have a look at the Kernel direcly.
HW6: Asimov Kernel
Exercises 1,2:
Exercise 3:
Finding the “Bacterial Demos” public repo
I started analysing the constructs with the Repressilator.
This is the description: “This is a repressilator genetic circuit. It consists of 3 transcription units, where the CDS in each is a repressor that represses the promoter in the next transcription unit. This results in an oscillation of the concentrations of the 3 proteins.”
These 3 constructs have 3 different promoters, which generates different genetic ←→ phenotipic outputs:
J23101 Promoter: A transcription unit with a strong promoter.
J23106 Promoter: A transcription unit with a medium promoter.
Using Simulation feature, the repressillator was simulated using the following parameters:
Chassis: E. coli
Duration: 408 hours
Timestep: 60 min
Transfection: Transient transfection
These was the output:
Summary of the findings:
The simulation shows rapid initial accumulation followed by relatively stable RNA and protein concentration ranges over time, while endpoint RNAP and ribosome fluxes differ substantially among the three transcription units.
The construct driven by the J23101 (strong promoter) shows the highest activity, the J23106 (medium promoter) shows intermediate activity, and the J23117 (weak promoter) shows the lowest activity.
Exercise 4: Repressilator reconstructions
I recreated the Repressilator in the empty construct using parts from the Characterized Bacterial Parts repository.
First, I used the Search function in the right-hand menu to find the required bacterial parts. Then, I dragged and dropped the selected parts into the empty construct to assemble the circuit. The final design reproduced the three-transcription-unit repressilator architecture.
After building the construct, I used the Simulator by clicking the play button to test its behavior. I then compared the simulation output with the original Repressilator Construct available in the Bacterial Demos repository.
Repressillator Reconstruction 1
I replaced pLacI (regulated by LacI) with pTetR (regulated by TetR) in the first unit, while all other simulation parameters were kept the same. That means the input regulator of that node changed, but the overall loop structure is preserved.
The goal was to observe whether changing the promoter identity altered the resulting RNA concentrations, protein concentrations, RNAP flux, or ribosome flux compared with the original repressilator design.
Using Simulation feature, the new repressillator pTetR was simulated using the same parameters as before:
Chassis: E. coli
Duration: 408 hours
Timestep: 60 min
Transfection: Transient transfection
These was the output:
Summary of the findings:
The simulation looks the same cecause from the model’s perspective the system is still a symmetric 3-repressor cycle and each node still produces a repressor and represses the next node. So the dynamics remain qualitatively equivalent.
Repressilator Reconstruction 2:
In order to try to experiment another cyclic repression topology different from TetR → LacI → LambdaCI → TetR I’ve tried these:
Replace pLambdaCI with pLacI: to make two transcription units use the same promoter and see how that would affect the circuit’s behavior.
Replace pLacI with pLambdaCI: to test what happens when I switch which repressor controls that transcription unit.
Replace TetR CDS with LacI CDS: to see how the simulation changes when one repressor is replaced by another and the circuit has less repressor diversity.
And so I re-runned the simmulation and these were the plots:
The modified circuit converges to a steady state dominated by LambdaCI, with LacI and TetR near zero, and no oscillatory behavior observed.
Exercise 5
Construct 1
I designed this construct to test high constitutive expression using the strong J23101 promoter placed upstream of LacI, with an A1 RBS to support translation and an L3S2P24 terminator to end transcription. My rationale was to build a simple bacterial circuit with no regulatory feedback, so I would expect continuous LacI expression and relatively high, stable RNA and protein levels in the simulation.
The simulation of this first construct shows rapid initial expression followed by a stable steady state. RNA concentration increases quickly and stabilizes at approximately 0.8 relative units, while protein concentration stabilizes at approximately 0.65. RNAP and ribosome flux are constant, indicating continuous transcription and translation. This matches the expectation for a constitutive expression construct driven by the strong J23101 promoter.
Construct 2
The second construct shows significantly lower expression compared to the first. RNA concentration stabilizes at approximately 0.003 relative units and protein concentration at approximately 0.0025, both much lower than in the strong promoter construct. RNAP and ribosome flux are also reduced. The system still reaches a steady state with constant expression over time, indicating that changing the promoter strength affects the magnitude of expression but not the overall behavior.
Construct 3
For the third construct, I copied the Self-regulating Circuit from the Bacterial Demos repository into my workspace and ran the simulation without modifying its structure. This allowed me to observe the behavior of a circuit with built-in feedback regulation and compare it with the constitutive expression constructs.
The self-regulating circuit shows stable expression over time, reaching a steady state without oscillations. RNA concentration stabilizes at approximately 0.56 relative units and protein concentration at approximately 0.45. RNAP and ribosome flux are constant, indicating continuous but regulated expression. Compared to the constitutive constructs, the expression level is intermediate, reflecting the effect of feedback regulation on maintaining controlled output.
These results show that promoter strength controls expression level, while circuit structure, such as feedback regulation, influences how expression is maintained over time.
Week 7 HW: Genetic Circuits Part 2: Neuromorphic Circuits
Assignment Part 1: Intracellular Artificial Neural Networks (IANNs)
Question 1
Traditional genetic circuits are usually implemented in Boolean logic (ON/OFF), hand-designed as fixed logic. so representing nuanced behaviors often requires many gates, sharp thresholds, and careful tuning, which can make designs bulky and brittle. As the number of inputs grows the circuit complexity can explode combinatorially, increasing burden by stacking multiple layers and adding intermediate nodes, which increases metabolic load, failure points, and sensitivity to part-to-part variability Also, adapting to new targets or shifting biological context often means redesigning the circuit architecture, not just re-tuning parameters.
Intracellular Artificial Neural Networks (IANNs) are parametric and trainable: you can adjust “weights” to fit a desired behavior from data (calibration/learning), then iterate as conditions change. This is more condizent with the noisy and complex nature of biological signals. IANNs are parametric and trainable, designed to operate on analog inputs, tolerate noise through distributed computation, and approximate complex decision boundaries without enumerating every logic case. Using IANN you can adjust “weights” to fit a desired behavior from data (calibration/learning), then iterate as conditions change, which is in general a very wanted feature for biological modelling.
Question 2
A useful application for an IANN could be a multi-signal “smart probiotic” controller that decides when to express a therapeutic payload in the gut based on a noisy inflammation signature. This could be a proposed pipeline:
Sensors detect several analog inputs. These can be related to a mesurable intracellular signal (i.e. information on promoters/sensors response to nitrate/NO, tetrathionate, ROS, and low pH <-> measurable intracellular signal like transcription rate or a regulator concentration)
The IANN integrates these signals as weighted contributions and computes a graded output: a continuously tunable expression level of a payload gene (e.g., an anti-inflammatory cytokine mimic, a barrier-protective peptide, or a locally acting enzyme), plus an optional reporter for monitoring.
Instead of requiring all conditions to be “true” or “false,” like Boolerian models the IANN can implement a “risk score” that turns on strongly only when the combined pattern matches inflammation, while remaining low for benign fluctuations. In practice, you would calibrate the weights using training data from known conditions (healthy vs inflamed models) so the output tracks the probability or intensity of the target state.
Limitations / failure modes: IANNs still face real biological constraints such as sensor cross-talk and context effects. These can shift input distributions. Also, weights can drift as cells evolve, and metabolic burden can reduce growth or change the very physiology being measured. The dynamic range and noise of biological parts can compress signals, making it hard to separate “moderate” from “high” states without careful normalization and controls. Time dynamics also matter: inputs arrive on different timescales (transcription vs metabolites), so the network may need memory/filters to avoid reacting to transient spikes, which can substantially increase the complexibility of the network. Finally, safety and containment become part of the spec, thus important to define acceptance balance between error type 1 and 2 defining if you’d likely need a kill switch and strict limits on maximum output to avoid unintended activation in off-target contexts.
Question 3
Assigment Part 2: Fungal Materials
Question 1
Example 1: Mycelium composite foams (grown on agricultural waste)
Used for protective packaging, insulation panels, acoustic damping, and lightweight cores.
Advantages: renewable feedstocks, low-temperature manufacturing, biodegradable or compostable end-of-life, and tunable density via growth conditions.
Disadvantages: mechanical properties can vary batch-to-batch, moisture sensitivity unless coated, and long-term durability and standards testing can be harder than for petrofoams.
Example 2: Mycelium “leather” (mycelium-based sheets)
Used for footwear, bags, apparel, and upholstery as a leather alternative.
Advantages: avoids the animal leather supply chain, potentially lower land and chemical burden, and tunable texture and thickness.
Disadvantages: still often needs finishing steps for durability and water resistance, performance can lag high-grade leather, and cost and scale are still improving.
Example 3: Fungal biocement or mycelium-bound “bio-bricks”
Used for low-load building blocks, interior architectural elements, and decorative panels.
Advantages: low-energy fabrication, can use local waste substrates, lightweight, and potentially lower embodied carbon than fired bricks or some concretes.
Disadvantages: typically not comparable to concrete for structural strength, humidity and fire performance require careful engineering, and regulatory acceptance is slower.
Example 4: Fungal pigments and dyes (fermentation-derived)
Used for textiles, inks, coatings, and cosmetics.
Advantages: renewable production, avoids some petroleum-derived dye routes, and potentially lower toxic byproducts depending on the process.
Disadvantages: stability and colorfastness can be challenging, purification costs can be nontrivial, and some pigment pathways have safety constraints depending on the organism and compound.
Question 2
One may want to tune mycelium architecture (hyphal branching, wall composition, and crosslinking) to achieve specific strength, flexibility, porosity, and water resistance for composite materials. Another application is producing programmable functional materials by engineering fungi to secrete adhesives, hydrophobins, melanin-like coatings, or crosslinking enzymes so the final material is tougher or more water-stable without heavy post-processing.
Beyond material applications, genetically engineered fungi can be used for biosensing if we add genetic circuits that turn on a visible reporter in response to VOCs, toxins, inflammation markers, or pollutants, enabling living “sensor materials.” They can also be used for biomanufacturing high-value enzymes, small molecules, and therapeutics that benefit from eukaryotic processing or secretion, and for bioremediation by enhancing the breakdown of lignin, plastic additives, dyes, PFAS-like contaminants (where feasible), or heavy-metal binding, depending on pathway and safety constraints.
Fungi can be advantageous over bacteria because filamentous growth lets them act as a self-assembling scaffold, so the organism is both the “factory” and the “fabrication method.” They also offer eukaryotic protein processing because fungi handle disulfide bonds, folding, secretion, and many post-translational modifications better than most bacteria, which matters for secreted enzymes and complex proteins. In addition, fungi naturally secrete many enzymes, which is ideal for biomass conversion and environmental breakdown workflows. Another advantage relative to bacteria is metabolic breadth since fungi often tolerate more extreme acidic conditions and diverse feedstocks, and many are strong at producing secondary metabolites.
However, bioprocesses with engineered fungi may have practical limitations compared with bacteria, such as slower growth and iteration, more complex regulation and morphology (heterogeneity in filamentous cultures can make outputs less uniform), and genetic tools that can be trickier because strain engineering and predictable expression are often less plug-and-play than in E. coli.
Assigment Part 3: First DNA Twist Order
I reviewed the Individual Final Project documentation guidelines, submitted the Google Form with my draft Aim 1, final project summary, HTGAA industry council selections, and shared DNA design folder, and completed Part 3 of the Week 2 DNA Design Challenge by designing and uploading at least one insert sequence. I also documented the backbone vector for synthesis on my website.
Week 9 HW: Cell Free Systems
Homework Part A: General and Lecturer-Specific Questions
General homework questions
Exercise 1
Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell-free expression is more beneficial than cell production.
Cell-free systems allow full and direct control of reaction conditions and components, enabling rapid and flexible experimentation. Here’s a table with the main advantages of cell-free vs in vivo:
Aspect
Cell-free
In vivo
Environment control
Direct, tunable
Limited by cell physiology
Toxic proteins
Can express
Often lethal to host
Reaction conditions
Precisely adjustable
Fixed intracellular state
Speed
Minutes-hours
Hours-days
Component handling
Add/remove parts
Difficult
Cases where cell-free is more beneficial
Expression of toxic proteins (e.g., antimicrobial peptides)
Incorporation of non-natural amino acids
Expression of membrane proteins with detergents/liposomes
Rapid prototyping of genetic circuits
Exercise 2
Main components of a cell-free expression system and their role
Component
Role
Cell extract (lysate)
Provides ribosomes, enzymes, tRNAs
DNA/mRNA
Encodes target protein
Amino acids
Building blocks for protein
Energy system (ATP,GTP)
Drives transcription/translation
Cofactors (Mg²+, K+)
Maintain enzyme activity
Buffer
Stabilizes pH and environment
Exercise 3
Protein synthesis consumes large amounts of ATP and GTP. Because cell-free reactions lack the metabolic machinery of living cells, these energy molecules are rapidly depleted unless they are regenerated, which causes protein synthesis to stop and reduces yield.
A common way to maintain ATP supply is the phosphoenolpyruvate (PEP) system, in which PEP donates a phosphate group to ADP via pyruvate kinase to regenerate ATP: PEP + ADP → ATP (via pyruvate kinase). Other ATP regeneration strategies include creatine phosphate in which creatine phosphate transfers a phosphate to ADP via creatine kinase to rapidly regenerate ATP and glucose-based systems where Glucose is metabolized through enzymatic pathways to continuously produce ATP over longer reaction times.
PEP and creatine phosphate favor speed and simplicity, whereas glucose-based systems are better suited for longer and more sustainable reactions. Unless the process clearly requires extended reaction time, I would start with the PEP system because it typically delivers faster and higher ATP regeneration with a relatively simple setup.
Excercise 4: Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.
Prokaryotic vs eukaryotic cell-free systems
Prokaryotic
Eukaryotic
Speed
Fast
Slower
Cost
Lower
Higher
Protein folding
More limited
Better for complex proteins
Post-translational modifications
Minimal
Present or more compatible
Best suited for
Simple proteins
Complex eukaryotic proteins
Prokaryotic cell-free systems such as E. coli are faster and less expensive, making them suitable for producing simple proteins that do not require complex folding or post-translational modifications, such as GFP. In contrast, eukaryotic systems are slower and more costly but are better suited for proteins that require proper folding, disulfide bond formation, or eukaryotic processing, such as human antibody fragments.
Excercise 5
To optimize membrane protein expression in a cell-free system, I would design the reaction to include a membrane-like environment during synthesis, using detergents or liposomes to maintain solubility and support proper insertion. I would also optimize reaction conditions such as magnesium concentration and temperature, and add chaperones if necessary, to reduce misfolding and improve overall yield, because membrane proteins are especially prone to misfolding and insolubility in aqueous systems.
Challenge
Why it occurs
Experimental strategy
Expected benefit
Misfolding
Membrane proteins contain hydrophobic regions
Add chaperones; optimize temperature
Improves correct folding
Aggregation
Hydrophobic segments interact in solution
Add mild detergents (e.g., DDM)
Keeps protein soluble during synthesis
Insolubility
No native membrane is present
Add liposomes or nanodiscs
Provides membrane-like environment
Low insertion
Protein cannot embed properly in aqueous media
Include membrane mimics during expression
Supports insertion and stabilization
Poor yield
Reaction conditions may be suboptimal
Optimize Mg²⁺ and reaction conditions
Increases expression efficiency and stability
Excercise 6: Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.
Low yield in a cell-free system can result from insufficient transcription, depletion of ATP, degradation of the expressed protein, or poor folding conditions. Troubleshooting should therefore target the limiting step directly: improve template quality if transcription is weak, reinforce energy regeneration if the reaction stalls, inhibit proteases if degradation is suspected, and optimize temperature or folding support if the protein is unstable or misfolded.
Homework question from Kate Adamala
I would design a phospholipid vesicle-based synthetic minimal cell that uses the blue-light regulator EL222 to activate expression of the tyrosinase gene melA, producing melanin as a visible record of cumulative light exposure.
Question 1
A light-exposure logging synthetic minimal cell for integration into a wearable or material patch.
a)
input:
the synthetic cell would detect blue/visible light and respond by producing melanin
a realistic light-sensing module is EL222, a one-component blue-light activated transcription factor from Erythrobacter litoralis that binds DNA upon illumination
output:
gradual, visible darkening that records cumulative exposure over time.
a realistic pigment-output gene is melA, a tyrosinase gene from Rhizobium etli that has been used to generate melanin in E. coli.
b) This function could be realized by cell-free Tx/Tl alone only partially. In bulk cell-free solution, the circuit could still produce melanin, but without encapsulation it would not behave as a discrete synthetic minimal cell and would be harder to localize, stabilize, or integrate into a material as a spatially resolved light-logging unit.
c) This function could also be realized by a genetically modified natural cell. For example, E. coli can be engineered to express melA and produce melanin. A synthetic minimal cell is preferable if the goal is a compartmentalized, material-compatible system rather than a living replicating microbe.
d) The desired outcome is that the synthetic cell becomes darker as cumulative light exposure increases. In a material, a population of these vesicles would function as a distributed exposure log: more illuminated regions would accumulate more melanin and therefore appear darker than shaded regions.
Question 2
a) The membrane would be a phospholipid vesicle, for example POPC + cholesterol, because that is a standard stable composition for synthetic cell vesicles and is also used in related artificial-cell communication systems.
b) Inside the vesicle, I would encapsulate an E. coli cell-free transcription/translation system, amino acids, NTPs, salts, and cofactors, an ATP regeneration system, such as PEP + pyruvate kinase, L-tyrosine as the melanin precursor
Cu²⁺ as a cofactor for tyrosinase, DNA encoding the light-response module and melanin-output module.
c) For the Tx/Tl source, a bacterial system is sufficient. The core regulator, EL222, is bacterial, and the output enzyme MelA tyrosinase does not require mammalian-specific post-translational processing to function as a pigment-producing enzyme.
d) The synthetic cell would communicate with the environment mainly through light, which crosses the membrane directly, so no membrane channel is required for the input. To simplify the system, I would preload tyrosine and copper inside the vesicle. If I later wanted continuous substrate exchange from the outside, I could add a pore such as α-hemolysin (Hla), which is commonly used in synthetic-cell communication designs.
Exercise 3 - Experimental details
a)
Lipids: POPC, cholesterol
Genes: EL222 from Erythrobacter litoralis as the light-activated transcription factor; melA from Rhizobium etli as the tyrosinase gene for melanin production
optional: hla for α-hemolysin if external substrate exchange is needed; Encapsulated reagents: E. coli cell-free lysate or PURE-like system, amino acids, NTPs, PEP, pyruvate kinase, tyrosine, Cu²⁺
b)
I would measure the function of the system by tracking darkening over time, using image analysis and bulk absorbance measurements. The most direct readout is the increase in visible pigmentation of illuminated vesicles relative to dark controls; microscopy could also be used to compare spatial patterns of melanin accumulation across the material.
Homework question from Peter Nguyen
Application field: Textiles / Fashion
One-sentence pitch
A textile integrated with freeze-dried cell-free melanin-producing modules that develops gradual, skin-adjacent tonal changes in response to light exposure, turning the garment into an exposure-recording surface.
How it works
The material would incorporate localized freeze-dried cell-free reaction zones containing the genetic and enzymatic components needed for melanin production, for example a light-responsive regulator such as EL222 coupled to a melanin-producing gene such as melA. When the textile is activated by hydration, these embedded reaction zones become functional and begin responding to light exposure by expressing tyrosinase and generating melanin from preloaded substrate. Over time, more exposed regions of the garment darken more than shaded or covered regions, creating gradients or “tan-line-like” traces directly in the material. Functionally, the textile behaves less like a conventional dyed fabric and more like a programmable, exposure-sensitive biological film.
Societal challenge or market need
This concept addresses the growing interest in responsive and personalized materials in fashion and design, especially materials that are not just decorative but capable of recording use, environment, or time. It also responds to demand for alternatives to static coloration and conventional dyeing by proposing a material whose visual output is generated biologically in place. Beyond fashion, the same platform could be relevant to design objects or artistic textiles that visibly register environmental exposure.
How to address limitations of cell-free reactions
Because freeze-dried cell-free systems require water for activation and are typically limited in duration, I would treat the material as an on-demand activation platform rather than a permanently active textile. The garment could be hydrated only when the user wants to generate a pattern or record a specific exposure event, which also helps manage stability and one-time use.
To improve shelf life, the cell-free modules would remain freeze-dried until use and be stored in sealed conditions;
To improve localization and handling, they could be embedded in discrete patches, printed zones, or replaceable inserts rather than distributed uniformly across the whole textile. This makes the limitation part of the design logic: the material is activated intentionally, records one event or interval, and then remains as the final artifact.
Background information (max 100 words)
Space radiation can damage DNA and reduce the reliability of biological systems used for diagnostics, manufacturing, and environmental sensing during long-duration missions. This is significant because future crews will likely depend on compact biotechnology tools rather than constant resupply from Earth. It is relevant for space exploration because cell-free systems are lightweight, storable, and already attractive for use in resource-limited environments. It is scientifically interesting because it links a basic biological question - how nucleic acid damage affects gene expression - to an applied engineering problem: how to maintain functional biotechnology in space.
Molecular or genetic target (max 30 words)
Integrity and expression efficiency of a PCR-amplified sfGFP DNA template after radiation-mimicking UV exposure.
How the target relates to the challenge (max 100 words)
The sfGFP DNA template serves as a simple reporter for whether a biologically useful DNA sequence remains functional after damage. If radiation-like exposure degrades the template, BioBits cell-free protein expression should produce less GFP signal. This makes the target directly relevant to the space biology challenge, because many space biotechnology applications depend on DNA templates remaining intact enough to be transcribed and translated. Measuring GFP output therefore provides a practical way to estimate how radiation damage could impair future cell-free diagnostics or production systems used in spacecraft or habitats.
Hypothesis or research goal (max 150 words)
My hypothesis is that increasing UV exposure, used here as a classroom-accessible proxy for radiation-induced nucleic acid damage, will reduce the ability of a PCR-amplified sfGFP DNA template to produce GFP in the BioBits cell-free expression system. I further expect that templates protected by a shielding condition, such as melanin-containing film or another UV-blocking barrier, will retain more expression than unprotected templates exposed to the same dose. The reasoning is that DNA damage should interfere with transcription and translation by reducing template integrity, while a protective barrier should lower that damage. The research goal is to test whether cell-free fluorescence output can function as a simple readout of DNA stability under space-relevant stress and whether a lightweight protective strategy improves performance.
Homework question from Ally Huang
Background information: Space radiation can damage DNA and reduce the reliability of biological systems used for diagnostics, manufacturing, and environmental sensing during long-duration missions. This is significant because future crews will likely depend on compact biotechnology tools rather than constant resupply from Earth. It is relevant for space exploration because cell-free systems are lightweight, storable, and already attractive for use in resource-limited environments. It is scientifically interesting because it links a basic biological question: how nucleic acid damage affects gene expression to an applied engineering problem: how to maintain functional biotechnology in space.
Molecular or genetic target: Integrity and expression efficiency of a PCR-amplified sfGFP DNA template after radiation-mimicking UV exposure.
How the target relates to the challenge: The sfGFP DNA template serves as a simple reporter for whether a biologically useful DNA sequence remains functional after damage. If radiation-like exposure degrades the template, BioBits cell-free protein expression should produce less GFP signal. This makes the target directly relevant to the space biology challenge, because many space biotechnology applications depend on DNA templates remaining intact enough to be transcribed and translated. Measuring GFP output therefore provides a practical way to estimate how radiation damage could impair future cell-free diagnostics or production systems used in spacecraft or habitats.
Hypothesis or research goal: My hypothesis is that increasing UV exposure, used here as a classroom-accessible proxy for radiation-induced nucleic acid damage, will reduce the ability of a PCR-amplified sfGFP DNA template to produce GFP in the BioBits cell-free expression system. I further expect that templates protected by a shielding condition, such as melanin-containing film or another UV-blocking barrier, will retain more expression than unprotected templates exposed to the same dose. The reasoning is that DNA damage should interfere with transcription and translation by reducing template integrity, while a protective barrier should lower that damage. The research goal is to test whether cell-free fluorescence output can function as a simple readout of DNA stability under space-relevant stress and whether a lightweight protective strategy improves performance.
Experimental plan:
I will amplify an sfGFP template with the miniPCR and divide it into groups:
no UV exposure
low UV
high UV, and
UV plus shielding
After treatment, each sample will be added to BioBits cell-free reactions. Negative controls will include reactions with no DNA template; positive controls will include unexposed template.
GFP fluorescence will be measured with the P51 Molecular Fluorescence Viewer and quantified by image intensity or relative brightness. The main data will be fluorescence level across conditions, which will indicate how template damage affects expression and whether the shielding condition preserves function.
Homework Part B: Individual Final Project
general info / link for my slide in the CT slide deck
Title: Engineering Tunable Skin Pigment Expression in Engineered Living Materials
Aim 1: Generate base data on melanogenesis by mapping key pathways and build an initial genetic circuit informed by this base data to produce tunable pigmentation (eumelanin-biased outputs for darker tones and pheomelanin-biased outputs for warmer tones).
Aim 2: Expand and refine the circuit aiming for selecting envisioned great candidates for wet-lab experimentation. Experiments planning.
Aim 3: Empirical essays to explore how variables such as pigment amount, distribution, and system conditions affect the final material output.
Companies: BioFabricate; Cultivarium
Industry Council Companies: BioFabricate and Cultivarium
I selected them because they each address a different core part of my project:
Biofabricate could potentially bring a strong expertise on how to translate embedding melanin-related genetic circuits into a desirable (aesthetic and functional) engineered living material, while Cultivarium is well aligned with the wet-lab side of the project, particularly chassis selection, non-model organism engineering, and the practical challenge of implementing and optimizing the circuit in a host such as Komagataeibacter rhaeticus.
Submit the Final Project selection form.
Started planning how I will write my final project documentation based on the guidelines
To be done by April 10 at 11PM ET.
Prepare your first DNA order and put it in the “Twist (MIT)” or “Twist (Nodes)” tab of the 2026 HTGAA Ordering: DNA, Reagents, Consumables spreadsheet, as appropriate.
I will measure visible melanin output in the material as the primary readout of the project.
I want to quantify:
Degree of darkening
Spatial distribution of pigmentation
Stability/Persistence of the pigmentation in the bacterial cellulose / after drying or storage
These measurements are directly relevant because they indicate whether the melanin-producing system is functioning and whether the output is compatible with the intended material application.
How to measure?
a) Initial measurements: Molecular biology
First, to validate the genetic component, I would measure the presence of the designed construct by PCR and confirm the DNA sequence by DNA sequencing. I would use agarose gel electrophoresis to confirm correct DNA assembly before testing expression.
To verify whether the melanin-producing pathway is being expressed in a cell-free or microbial test system before integration into the material, I could also use gel electrophoresis to confirm DNA assembly and cell-free assay readouts to test whether the construct produces the expected visible darkening before integrating it into bacterial cellulose.
b) Material measurements:
These are the most direct indicators of whether the melanin-producing system is working and whether the output is useful as a material feature rather than only a biochemical signal.
I would first document the material using standardized photography under controlled lighting and then quantify changes in tone by image analysis, comparing pixel intensity or color values across samples and conditions. I would also use absorbance or spectrophotometric measurements when possible to obtain a more objective estimate of pigment accumulation.
As a secondary measurement, I would use UV-Vis absorbance or reflectance spectroscopy, if available, to quantify pigment accumulation more objectively.
Homework: Waters Part 1 — Molecular Weight
Question 1
eGFP (native): ~26.9 kDa
eGFP + LEHHHHHH tag: ~27,875.41 Da
All spaces and line breaks were removed.
Question 2
To calculate the molecular weight of intact eGFP, I selected two adjacent peaks from the LC-MS spectrum at m/z 933.7349 and 965.9684.
Using the adjacent charge state equation, this gives a charge state of approximately 30 for the first peak, meaning the second adjacent peak corresponds to 29. I then used these charge states to calculate the molecular weight from each peak, using the relationship between m/z, charge, and proton mass. This gave values of 27,981.8 Da and 27,983.9 Da, respectively, with an average experimental molecular weight of 27,982.9 Da.
I then compared this experimental value with the theoretical molecular weight of the full eGFP construct, including the LE linker and His tag, which is 28,006.3 Da. The relative error was 0.084%, showing very good agreement between the experimental and predicted values. This indicates that the adjacent charge state method produced an accurate estimate of the intact protein mass.
For the zoomed-in peak near m/z 1474, the charge state can also be reasonably assigned. Based on the experimental molecular weight, a 19+ ion would appear at about m/z 1473.8, which closely matches the observed signal. So yes, the charge state of the zoomed-in peak can be observed, and it is most consistent with z = 19.
Homework: Waters Part II — Secondary/Tertiary structure
Question 1
This unfolding changes how the protein gets charged during electrospray ionization. In the native state, fewer sites are accessible for protonation, so the protein carries fewer charges and appears at higher m/z values. In the denatured state, more sites are exposed, so the protein can carry more charges, which shifts the signal to lower m/z values.
In the mass spectrum (Figure 2), this shows up clearly. The native protein has a tighter charge state distribution at higher m/z, while the denatured protein has a broader distribution shifted toward lower m/z. So basically, by looking at how the charge state envelope shifts, we can tell whether the protein is folded or unfolded.
Question 2
If we zoom into the peak around m/z ~2800 in the native spectrum, we can determine the charge state by looking at the spacing between the small peaks in the isotope pattern. At high resolution, these peaks are separated by approximately 1/z.
From the inset, the peaks are spaced by about ~0.05–0.06 m/z units. Since the spacing is equal to 1/z, this suggests:
z ≈ 1 / 0.05 ≈ 20
So the charge state is approximately 20+.
This also makes sense when compared to the protein’s mass (~28 kDa). A 20+ ion would appear around m/z ≈ 2800, which matches the observed peak. So both the isotope spacing and the overall m/z position are consistent with a charge state of 20+.
Homework: Waters Part III — Peptide Mapping - primary structure
Question 1
Lysine (K): 20
Arginine (R): 6
Total K + R: 26
Number of tryptic peptides generated: 27
To analyze the eGFP standard, I first reviewed the full amino acid sequence provided, including the LE linker and the C-terminal His-tag (HHHHHH). I then identified all lysine (K) and arginine (R) residues, since trypsin cleaves specifically after K and R residues unless the following amino acid is proline (P).
After counting the residues in the sequence using Benchlink, I found a total of 20 lysines (K) and 6 arginines (R), for a combined total of 26 potential trypsin cleavage residues.
Question 2
I also checked whether any of these K or R residues were followed by proline, which would block trypsin cleavage, and I found that none of them were followed by P. Therefore, all 26 sites are valid trypsin cleavage sites. Because each cleavage site divides the sequence into peptide fragments, the total number of peptides expected from complete tryptic digestion is the number of cleavage sites plus one. Based on this, the digest should generate 27 peptides in total.
To double check this, I have pasted the eGFP amino acid sequence into the ExPASy PeptideMass tool, selected trypsin as the digestion enzyme, and used the parameters shown in Figure 4, including 0 missed cleavages, monoisotopic mass, and no modifications. I then clicked “Perform the Cleavage” to generate the predicted list of tryptic peptides and determine the total number of peptides produced.
After manually counting 26 lysine and arginine residues, I expected a total of 27 tryptic peptides. When I ran the sequence in the ExPASy PeptideMass tool, the output showed fewer peptides than expected. However, this is because the tool was set to display only peptides with masses greater than 500 Da, which excludes smaller fragments.
Question 3
To analyze the peptide map, I examined the total ion chromatogram (TIC) in Figure 5a and focused on the retention time window between 0.5 and 6 minutes. I counted only peaks with a relative intensity greater than approximately 10% of the base peak, as specified. Based on this criterion, I observe approximately 18–20 chromatographic peaks between 0.5 and 6 minutes. The exact number depends slightly on how closely overlapping peaks are resolved, particularly in the region between ~2.5 and 3.5 minutes, where several peaks are closely spaced.
Question 4
The chromatogram shows fewer peaks than the number of peptides predicted from question 2. In question 2, the full tryptic digest was predicted to generate 27 peptides. In the chromatogram, counting only peaks above the 10% relative abundance threshold between 0.5 and 6 minutes gives roughly 20 peaks.So there are fewer peaks in the chromatogram than predicted peptides. This likely means that some peptides are either too low in abundance, too small, or co-elute with other peptides and therefore do not appear as separate visible chromatographic peaks.
Question 5
To analyze the peptide in Figure 5b, I first identified the most intense peak in the spectrum, which appears at m/z ≈ 525.77. I assumed this corresponds to the most abundant charge state of the peptide.
To determine the charge state, I examined the zoomed-in isotope pattern. The spacing between adjacent isotope peaks is about 0.5 m/z unit. Since isotope spacing is approximately equal to 1/z, a spacing of ~0.5 indicates that z ≈ 2. Based on this, I concluded that the most abundant charge state is z = 2+.
Next, I calculated the mass of the singly charged form of the peptide, M+H+, using the relationship:
M+H+ = z(m/z) − (z − 1)(1.0073)
Substituting the values:
M+H+ = 2(525.77) − 1.0073 ≈ 1050.53 Da
So, the peptide has:
m/z ≈ 525.77
charge state z = 2+
M+H+ ≈ 1050.53 Da
This result is consistent with the spectrum, since there is also a peak visible near m/z ≈ 1050.52, which corresponds to the singly charged form of the same peptide.
Question 6
From the previous step, I determined that the most abundant ion was at m/z 525.7671 with charge z = 2, which gives a singly charged mass of about M+H+ = 1050.53 Da. In the PeptideMass results, the closest expected peptide mass is 1050.5214 Da, which corresponds to the peptide FEGDTLVNR. Based on that match, I identified the peptide as FEGDTLVNR.
To evaluate the mass accuracy, I compared the experimental mass to the theoretical mass from PeptideMass. Using the exact value labeled in the spectrum, the experimental singly charged mass is 1050.52438 Da, and the theoretical mass is 1050.5214 Da. The mass difference is therefore:
I use the aid of chatgpt for comparing the theoretical and experimental subunits masses in the answering below.
To identify the Keyhole Limpet Hemocyanin (KLH)’s oligomeric states in the CDMS spectrum, I used the subunit masses given in Table 1 and multiplied them by the number of subunits expected in each assembly. I then compared those theoretical masses to the labeled peaks in Figure 7.
Here are the results summarized in a table:
Oligomeric species
Theoretical mass
Peak in the mass spectrum of Keyhole Limpet Hemocyanin (KLH) acquired on the CDMS
Interpretation
7FU Decamer
3.4 MDa
~3.4 MDa
This peak is consistent with the expected mass of a 10-subunit 7FU assembly.
8FU Didecamer
8.0 MDa
~8.33 MDa
This is the closest and most intense peak, so it is the strongest candidate for the 8FU didecamer.
8FU 3-Decamer
12.0 MDa
~12.67 MDa
This peak is reasonably close to the expected tridecamer mass and likely represents a higher-order 8FU assembly.
8FU 4-Decamer
16.0 MDa
~16-17 MDa
The weak signal in this region may correspond to the 8FU 4-decamer, although this assignment is more tentative.
Discussion
To interpret the CDMS spectrum, I compared the theoretical oligomer masses calculated from the known KLH subunit masses with the labeled peaks in Figure 7. Based on this comparison. The observed masses are not perfectly identical to the theoretical values, but they are close enough to support these assignments as working hypotheses.
Example proxy calculations:
For the 7FU decamer (10 units): 7FU subunit mass = 340 kDa
Since a decamer contains 10 subunits, the expected mass is: 10 × 340 = 3400 kDa = 3.4 MDa
In the spectrum, there is a labeled peak at about 3.4 MDa I would assign that peak to the 7FU decamer. This corresponds to a 4.5 mDa from the x axis analysis.
The slight offsets could reflect experimental uncertainty, heterogeneity in the sample, adducting, or the natural structural complexity of KLH. Overall, my interpretation is that the spectrum supports a mixture of KLH oligomeric states, with the 8FU didecamer appearing to be the predominant species and the larger 8FU assemblies likely representing less abundant higher-order associations.
The 8.33 MDa peak is by far the most intense feature in the spectrum. This suggests that the 8FU didecamer may be the dominant oligomeric state in this sample under the conditions used for CDMS.
In contrast, the peaks assigned to the 8FU 3-decamer and especially the 8FU 4-decamer are much less abundant, which may indicate that these larger assemblies are present only as minor populations or form less stably in solution.
Taking the absolute value, the mass error is approximately 836 ppm. The observed intact LC-MS mass is close to the theoretical eGFP construct mass, so the data supports that the sample is consistent with the expected GFP/eGFP construct.
I used ChatGPT as a writing and reasoning assistant to help review calculations, improve explanations, and check whether my answers addressed the homework prompts. All final interpretations, edits, and submitted content were reviewed by me.
Week 11 HW: Bioproduction & Cloud Labs
Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork
I contributed 7 pixels to the global artwork experiment, helping extend a horizontal yellow line in the top-left area (see screenshot below).
At first, I was cautious and tried to understand the ongoing ideas for each section and whether there was a unifying concept. I considered introducing something new, but ultimately decided to stick with what seemed to be the area’s goal (a horizontal yellow line). For next year, it might be fun to have an in-app chat within the same domain to coordinate contributions more easily and check the current vibes.
Part B: Cell-Free Protein Synthesis | Cell-Free Reagents
Question 1
Component Category
Component
Corrected role in the cell-free reaction
Lysate
E. coli Lysate
Provides the endogenous transcription, translation, and metabolic machinery needed for in vitro gene expression.
BL21 (DE3) Star Lysate (includes T7 RNA Polymerase)
Provides the same core lysate machinery plus T7 RNA polymerase for strong transcription from T7 promoter templates.
Salts / Buffer
Potassium Glutamate
Helps set intracellular-like ionic conditions that support enzyme activity, ribosome function, and overall reaction performance.
HEPES-KOH pH 7.5
Maintains reaction pH in the range needed for stable transcription-translation activity.
Magnesium Glutamate
Supplies Mg2+, an essential cofactor for ribosomes, polymerases, and many ATP-dependent enzymes.
Potassium phosphate monobasic
Contributes phosphate and helps maintain buffer balance together with the dibasic form.
Potassium phosphate dibasic
Works with the monobasic form to maintain phosphate buffering and reaction stability.
Energy / Nucleotide System
Ribose
Supports nucleotide metabolism and regeneration pathways rather than serving as the main energy source.
Glucose
Serves as a metabolic energy substrate that helps regenerate ATP through endogenous lysate metabolism.
AMP
Acts as a nucleotide monophosphate precursor that can be phosphorylated into higher-energy adenine nucleotides.
CMP
Acts as a nucleotide precursor that can be converted into CTP for transcriptional needs.
GMP
Acts as a nucleotide precursor that can be converted into GTP for transcription and translation-related processes.
UMP
Acts as a nucleotide precursor that can be converted into UTP for transcriptional needs.
Guanine
Serves as a salvage precursor for guanine nucleotide synthesis.
Translation Mix (Amino Acids)
17 Amino Acid Mix
Provides most of the amino acid building blocks required for protein synthesis.
Tyrosine
Provides a required amino acid for translation and may also be supplied separately because of formulation or pathway-specific needs.
Cysteine
Provides a required amino acid for translation and is often added separately because of its chemical instability.
Additives
Nicotinamide
Serves as a precursor for NAD-related cofactors that support extract redox metabolism.
Backfill
Nuclease Free Water
Brings the reaction to the target volume without introducing nucleases or contaminants.
Question 2
The 1-hour PEP-NTP system supplies fully activated NTPs and high-energy phosphate (PEP) upfront, enabling fast, high-rate transcription and translation but with limited longevity due to rapid energy depletion.
In contrast, the 20-hour NMP-ribose-glucose system relies on metabolic regeneration, using NMPs and simple substrates (ribose, glucose) that are enzymatically converted into active nucleotides and ATP, trading peak speed for sustained, longer-duration protein production.
Part C: Planning the Global Experiment | Cell-Free Master Mix Design
Description: a basic (constitutively fluorescent) green fluorescent protein published in 2005, derived from Aequorea victoria. It is reported to be a very rapidly-maturing weak dimer.
sfGFP has very efficient folding and fast maturation (~13 min), allowing it to produce fluorescence quickly and reliably even under suboptimal cell-free conditions. This makes it ideal for early and robust readout.
b. Monomeric Red Fluorescent Protein 1 (mRFP1)
mRFP1: Derived from DsRed, mRFP1 has slow maturation and lower photostability, which delays fluorescence signal and reduces effective brightness in short or energy-limited cell-free reactions.
c. mKusabira-Orange2 (mKO2)
mKO2 has moderate maturation speed but higher sensitivity to photobleaching and environmental conditions, which can reduce signal stability during long incubations or repeated excitation. This protein is relatively acid-sensitive (higher pKa), so its fluorescence can decrease if the cell-free reaction acidifies over time, affecting signal stability.
d. mTurquoise2
This protein has an exceptionally high quantum yield and photostability, making it one of the brightest CFP variants and ideal for strong signal readout even at low expression levels.
e. mScarlet_I
mScarlet-I is optimized for high brightness and improved maturation efficiency among red FPs, enabling stronger signal compared to earlier RFPs, though maturation still limits very early readouts compared to GFP variants.
f. Electra2
As a newer engineered FP (likely optimized variant), its performance is typically influenced by trade-offs between brightness, folding efficiency, and maturation kinetics, meaning signal output depends strongly on how well it folds and matures in the cell-free environment.
Question 2
Hypothesis: For mKO2, increasing the HEPES-KOH buffer concentration and maintaining sufficient glucose in the cell-free mastermix will improve fluorescence over a 36-hour incubation by reducing pH drift and sustaining ATP regeneration.
Rationale: Because mKO2 is relatively acid-sensitive, stronger pH buffering should help preserve fluorescence, while sustained glucose-dependent energy regeneration should support continued protein expression and chromophore maturation, resulting in a higher final fluorescence signal.
Small caveat: glucose can also contribute to acidification depending on the metabolism of the lysate, so the strongest version is really HEPES-KOH + controlled glucose, not just “more glucose.”
Question 3
sfGFP → system calibration (TX-TL health)
Melanin has a broad absorbance spectrum, but it absorbs much more strongly at shorter wavelengths (blue/green) than at longer wavelengths (red). Melanin interferes with optical readout since we will be trying to measure fluorescence in a reaction that is simultaneously getting darker, which creates optical interference broadening the wavelengh espectrum of signal.
mScarlet-I → expression readout for melA tyrosinase especifically
fluorescence is less sensitive to melanin, so it better tracks expression alone (sfGFP → Ex ~488 nm / Em ~510 nm → high overlap with melanin absorbance; mTurquoise2 → even worse (blue region); mScarlet-I → Ex ~569 nm / Em ~594 nm → less overlap).
Question 4
For optimizing the Master Mix design for mScarlet-I in my melA tyrosinase cell-free system, I’d supplement CuSO4 since my analyte is a cooper dependent enzyme, HEPES-KOH pH 7.5 to have an additional buffer against acidification and magnesium glutamate to improve translation capacity.
At first I thought about adding glucose since it could extend energy regeneration, but then I wondered that it may also increase acidification. Since you’re worried about fluorescence readout in a pigment-producing system, I’d prioritize pH stability over extra glucose.
I’d actually supplement L-tyrosine that serve as a functional validation that my protein of interest MelA tyrosinase is being expressed and active.
Master Mix designs to be tested using mScarlet-I and sfGFP:
REACTION 1
My preparation before have received email (to your email address as registered here on the Forum) providing your personal link to participate in the Cell-Free Master Mix Cloud Lab Global Experiment:
my melA-tyrosine cell-free system
mScarlet-I
Supplement
Volume
Purpose
HEPES-KOH pH 7.5
1.0 µL
Buffer against pH drift over 36h, helping preserve mScarlet-I fluorescence and MelA activity.
L-tyrosine
0.75 µL
Provides additional substrate for MelA-driven melanin-like pigment production.
CuSO4, very low concentration
0.25 µL
Supports MelA tyrosinase activity as a copper-dependent enzyme while minimizing toxicity/inhibition.
Increasing buffering capacity with HEPES-KOH seems also a good idea because prolonged cell-free reactions coupled with melanin production lead to progressive acidification, which can reduce fluorescent protein signal, impair MelA activity, and shorten the productive lifetime of the TX-TL system.
REACTION 2
my melA-tyrosine cell-free system
sfGFP
Supplement
Volume
Purpose
HEPES-KOH pH 7.5
1.0 µL
Buffer against pH drift over 36h, helping preserve mScarlet-I fluorescence and MelA activity.
L-tyrosine**
0.75 µL
Provides additional substrate for MelA-driven melanin-like pigment production.
CuSO4, very low concentration
0.25 µL
Supports MelA tyrosinase activity as a copper-dependent enzyme while minimizing toxicity/inhibition.
Increasing buffering capacity with HEPES-KOH seems also a good idea because prolonged cell-free reactions coupled with melanin production lead to progressive acidification, which can reduce fluorescent protein signal, impair MelA activity, and shorten the productive lifetime of the TX-TL system.
REACTION 3
my melA-tyrosine cell-free system
mScarlet-I
Reagent
Volume
Purpose
L-tyrosine
0.8 µL
Direct substrate for MelA pigment production
HEPES-KOH pH 7.5
0.6 µL
Reduces pH drift over 36h
Magnesium glutamate
0.4 µL
Supports sustained transcription-translation
Low CuSO4
0.2 µL
Supports tyrosinase catalytic activity
As copper is required as a cofactor for MelA tyrosinase activity, but must be carefully controlled because excess Cu²⁺ can inhibit cell-free expression and promote nonspecific oxidative reactions I decided to test reducing it and supplement magnesium glutamate since it improves TX-TL capacity by supporting ribosomes, RNA polymerase, and Mg-ATP/GTP chemistry.
REACTION 4
my melA-tyrosine cell-free system
sfGFP
Reagent
Volume
Purpose
L-tyrosine
0.8 µL
Direct substrate for MelA pigment production
HEPES-KOH pH 7.5
0.6 µL
Reduces pH drift over 36h
Magnesium glutamate
0.4 µL
Supports sustained transcription-translation
Low CuSO4
0.2 µL
Supports tyrosinase catalytic activity
REACTION 5
my melA-tyrosine cell-free system
mScarlet-I
Reagent
Volume
Purpose
HEPES-KOH pH 7.5
1.25 µL
Stronger buffering against pH drift over 36h.
Low CuSO4
0.25 µL
Enables MelA tyrosinase activity as a copper-dependent enzyme.
Nuclease-free water
0.50 µL
Keeps total supplement volume at 2 µL without adding more substrate.
This reaction tests whether the main limitation is pH stability + copper availability, rather than additional tyrosine. It is useful because the base mastermix already contains tyrosine, so this condition asks whether MelA can produce pigment when copper is supplied and pH is stabilized without further increasing substrate concentration.
REACTION 6
my melA-tyrosine cell-free system
sfGFP
Reagent
Volume
Purpose
HEPES-KOH pH 7.5
1.25 µL
Stronger buffering against pH drift over 36h.
Low CuSO4
0.25 µL
Enables MelA tyrosinase activity as a copper-dependent enzyme.
Nuclease-free water
0.50 µL
Keeps total supplement volume at 2 µL without adding more substrate.
This reaction tests whether the main limitation is pH stability + copper availability, rather than additional tyrosine. It is useful because the base mastermix already contains tyrosine, so this condition asks whether MelA can produce pigment when copper is supplied and pH is stabilized without further increasing substrate concentration.
REACTION 7
my MelA-tyrosine cell-free system
sfGFP
Reagent
Volume
Purpose
L-tyrosine
1.50 µL
Pushes substrate availability to test whether pigment formation is substrate-limited.
CuSO4, very low concentration
0.25 µL
Enables MelA catalytic activity.
HEPES-KOH pH 7.5
0.25 µL
Minimal pH support.
This is the pigment-stress condition: it intentionally pushes melanin production to test whether sfGFP fluorescence collapses when the reaction darkens. If sfGFP drops while pigment rises, that supports using mScarlet-I as the better reporter.
REACTION 8
my MelA-tyrosine cell-free system
sfGFP or mScarlet-I
Reagent
Volume
Purpose
HEPES-KOH pH 7.5
1.50 µL
Strongly buffers against acidification over 36h.
CuSO4, very low concentration
0.25 µL
Enables MelA activity.
L-tyrosine
0.25 µL
Keeps substrate present but avoids overloading the system.
This is the long-incubation preservation condition: it tests whether the best 36h outcome comes not from maximizing substrate, but from preventing reaction decay. If fluorescence and pigment both remain stronger at 36h, pH stability is the key design variable.
My actual experiments submitted
Now that I’ve seen the inferface better, I got that the goal here is to focus on DNA construct performance, so I’ll treat this as an expression/readout experiment rather than enzyme validation.
Went too far into broader bioprocess hypotheses 😅 in my brainstorm composition hypothesis above.
Given the broader objective of optimizing the cell-free master mix for maximal fluorescence across six proteins, I will test the 2 reporters:
mScarlet-I = better reporter under melanin/dark pigment interference
sfGFP = system health / pigment-interference control
This 1st round I will test these 8 reactions - Table Followed by textual arguments
Same as Reaction 7, but tests sfGFP under melanin-producing conditions.
Reaction
Hypothesis
1
Low HEPES and low tyrosine will provide a baseline fluorescence condition for comparison across proteins.
2
The same low HEPES / low tyrosine condition will reveal whether sfGFP is more sensitive to pigment-related interference than mScarlet-I.
3
Increasing HEPES will improve fluorescence over 36h by reducing pH drift.
4
Increasing HEPES will help determine whether pH stabilization benefits sfGFP fluorescence under the same conditions.
5
Increasing tyrosine will test whether extra substrate/pigment formation reduces fluorescence through optical interference.
6
High tyrosine with sfGFP will test whether green fluorescence is especially affected by pigment accumulation.
7
Combining HEPES, tyrosine, and magnesium glutamate will improve fluorescence by supporting pH stability, substrate context, and TX-TL capacity.
8
The same combined condition with sfGFP will test whether translation support and buffering can preserve fluorescence despite stronger pigment-forming conditions.
Hypothesis: Under minimal buffering and substrate availability, both melanin production and mScarlet-I fluorescence will be limited, providing a baseline to compare improvements from other conditions.
Hypothesis: This condition mirrors Reaction 1 but uses sfGFP to evaluate baseline fluorescence without strong pigment production, serving as a reference for how each reporter behaves under minimal conditions.
Hypothesis: Increasing buffering capacity with HEPES-KOH will improve mScarlet-I fluorescence over 36 hours by reducing pH drift, even without increasing substrate availability.
Hypothesis: This condition mirrors Reaction 3 but uses sfGFP to test whether stronger buffering preserves green fluorescence, or if signal is still affected by pigment formation and optical interference.
Hypothesis: Increasing tyrosine concentration will enhance melanin-like pigment production, indicating that MelA activity may be limited by substrate availability under baseline conditions.
Hypothesis: This condition mirrors Reaction 5 but uses sfGFP to evaluate whether increased pigment formation interferes with green fluorescence, compared to the red-shifted mScarlet-I signal.
Testing: pH drift, substrate limitation, and TX-TL capacity
Hypothesis: Combining buffering (HEPES-KOH), substrate availability (tyrosine), and translation support (magnesium glutamate) will help sustain melanin production and mScarlet-I fluorescence over 36 hours by addressing the main system bottlenecks.
Hypothesis: This condition mirrors Reaction 7 but uses sfGFP to evaluate how green fluorescence behaves under melanin-producing conditions, serving as a control to assess pigment interference relative to mScarlet-I.
For Week 2, the wet-lab component was optional for CLs with lab access, which unfortunately was not my case. I completed and documented the in-silico design and written assignments on my Homework Week 2 page.
For Week 3, CLs were required to create and document the Opentrons Python script, answer the post-lab questions, and submit final project ideas. I completed the code-based assignment and documentation on my Homework Week 3 page, but I did not have access to run the script physically on an Opentrons robot.
For this week, I completed what was expected from the CL side: the conceptual and design-oriented homework around PCR, Gibson cloning, transformation, Golden Gate Assembly, Benchling modeling, and Asimov Kernel exercises. Since I did not have access to a physical lab, I did not perform the wet-lab workflow, but my Week 6 Homework Documentation covers the main principles behind the lab assignment.
For Week 9, I completed the required CL homework components, including the general cell-free systems questions, lecturer-specific questions, and final project planning. Kimdly check my Week 9 Homework page.
For Week 10, I completed the CL homework requirements based on the provided lab screenshots/data as allowed in the homework instructions. Kindly check my Week 10 Homework Documentation.
As a CL without access to a physical lab, I completed the Week 11 cloud lab assignment through the design and documentation components. I contributed to the collective artwork, described the cell-free reaction components, compared the master mix strategies, and submitted reaction designs for the global experiment, focusing on mScarlet-I and sfGFP readouts with HEPES-KOH, tyrosine, and magnesium glutamate. Kindly check my documentation for Week 11 Homework.
Which genes when transferred into E. coli will induce the production of lycopene and beta-carotene, respectively? According to the lab instructions, lycopene production in E. coli is induced by transferring the three genes from Erwinia herbicola: crtE, crtI, and crtB. These genes convert FPP into lycopene. Beta-carotene production uses the same pathway with the addition of crtY, which enables conversion toward beta-carotene.
For Week 2, the wet-lab component was optional for CLs with lab access, which unfortunately was not my case. I completed and documented the in-silico design and written assignments on my Homework Week 2 page.
Week 3 Lab: Lab Automation
For Week 3, CLs were required to create and document the Opentrons Python script, answer the post-lab questions, and submit final project ideas. I completed the code-based assignment and documentation on my Homework Week 3 page, but I did not have access to run the script physically on an Opentrons robot.
For this week, I completed what was expected from the CL side: the conceptual and design-oriented homework around PCR, Gibson cloning, transformation, Golden Gate Assembly, Benchling modeling, and Asimov Kernel exercises. Since I did not have access to a physical lab, I did not perform the wet-lab workflow, but my Week 6 Homework Documentation covers the main principles behind the lab assignment.
Week 7 Lab: Neuromorphic Circuits
For this lab, the physical wet-lab component was not something I had access to as a CL. Kindly check my Week 7 Homework Documentation.
Week 9 Lab: Cell Free Systems
For Week 9, I completed the required CL homework components, including the general cell-free systems questions, lecturer-specific questions, and final project planning. Kimdly check my Week 9 Homework page.
Week 10 Lab: Mass Spectrometry
For Week 10, I completed the CL homework requirements based on the provided lab screenshots/data as allowed in the homework instructions. Kindly check my Week 10 Homework Documentation.
Week 11 Lab: Cloud Laboratories Homework & Lab
As a CL without access to a physical lab, I completed the Week 11 cloud lab assignment through the design and documentation components. I contributed to the collective artwork, described the cell-free reaction components, compared the master mix strategies, and submitted reaction designs for the global experiment, focusing on mScarlet-I and sfGFP readouts with HEPES-KOH, tyrosine, and magnesium glutamate. Kindly check my documentation for Week 11 Homework.
Week 12 Lab: Bioproduction of Beta-Carotene and Lycopene
Post Lab Questions (Mandatory for All Students)
1) Which genes when transferred into E. coli will induce the production of lycopene and beta-carotene, respectively? According to the lab instructions, lycopene production in E. coli is induced by transferring the three genes from Erwinia herbicola: crtE, crtI, and crtB. These genes convert FPP into lycopene. Beta-carotene production uses the same pathway with the addition of crtY, which enables conversion toward beta-carotene.
2) Why do the plasmids that are transferred into the E. coli need to contain an antibiotic resistance gene? The antibiotic resistance gene allows selection for E. coli cells that successfully received the plasmid. Only transformed cells can grow on antibiotic-containing media.
3) What outcomes might we expect to see when we vary the media, presence of fructose, and temperature conditions of the overnight cultures? Different media composition and temperatures can affect both cell growth and pigment production. Richer media may increase biomass, fructose may improve lycopene production by changing carbon metabolism, and lower temperature may reduce stress or improve pathway performance, while 37°C may favor faster growth. Based on the lab framing, fructose is being tested because it may improve biomass yield and recombinant gene expression in E. coli. If it improves carbon flux or reduces metabolic stress, pigment production per culture may increase. However, the final result would need to be normalized by OD600 to distinguish higher pigment production from simply higher cell growth.
4) Generally describe what “OD600” measures and how it can be interpreted in this experiment. OD600 measures how much light at 600 nm is scattered by a bacterial culture. As the number of cells increases, the culture becomes more turbid, meaning it scatters more light and gives a higher OD600 value. The 600 nm wavelength is commonly used because it estimates cell density without strongly overlapping with many biological pigments or media components. In this experiment, OD600 helps estimate how much bacterial growth occurred under each condition. This is important because pigment absorbance alone could be misleading: a darker sample might have more pigment simply because it has more cells. By normalizing pigment absorbance by OD600, we can compare carotenoid production per amount of bacterial growth.
5) What are other experimental setups where we may be able to use acetone to separate cellular matter from a compound we intend to measure? Acetone can be used in experiments where we want to separate an organic-soluble compound from the rest of the cell material. In this lab, it helps extract carotenoid pigments from bacterial pellets while leaving much of the cellular debris behind. Similar setups could include extracting chlorophyll or carotenoids from algae and plant tissue, recovering hydrophobic metabolites from microbial cultures, or preparing pigment extracts before absorbance measurements. It could also be useful as a cleanup step, because acetone can precipitate proteins and help remove cell debris before analyzing small molecules by absorbance, fluorescence, or chromatography.
6) Why might we want to engineer E. coli to produce lycopene and beta-carotene pigments when Erwinia herbicola naturally produces them? Even though Erwinia herbicola naturally produces these pigments, E. coli is a better model organism and engineering chassis. It is easier to grow, transform, measure, and genetically manipulate, with well-characterized plasmids, promoters, selection markers, and growth conditions. This makes it more useful for rapid prototyping, pathway optimization, and controlled bioproduction experiments. Engineering E. coli also lets us isolate and test the carotenoid pathway in a standardized host, instead of working with the natural producer where regulation and metabolism may be more difficult to control.
Post Lab Questions (For Committed Listeners)
1.1) What are the enzymes of the carotene pathway?
Enzyme
Gene
Role
GGPP synthase
crtE
Converts FPP into geranylgeranyl diphosphate, GGPP
Phytoene synthase
crtB
Condenses GGPP molecules to form phytoene
Phytoene desaturase
crtI
Converts phytoene into lycopene
Lycopene cyclase
crtY
Converts lycopene into beta-carotene
1.2) Within this pathway, which is the rate determining step (the step that takes the longest)? Which enzyme is responsible for this step?
Within the carotenoid pathway, my hypothesis is that the likely rate-determining step is the conversion of phytoene into lycopene, catalyzed by CrtI, the phytoene desaturase.
The reason is that crtE and crtB first build the upstream carotenoid intermediate: CrtE helps produce GGPP, and CrtB converts GGPP into phytoene. Then CrtI carries out the desaturation steps that convert phytoene into lycopene. Since this step involves multiple oxidation/desaturation reactions, I would expect it to be slower and more limiting than the upstream condensation steps.
The literature support this hypothesis, but also show that CrtI is probably not the only bottleneck. Du et al. 2016 confirm that E. coli requires crtE, crtB, and crtI to produce lycopene, and they show that fructose strongly improves lycopene production by changing central metabolism, especially pathways linked to precursor, cofactor, and energy supply. So I would identify CrtI/crtI as the most likely pathway-level enzymatic bottleneck, while recognizing that whole-cell lycopene production also depends on upstream metabolic supply. This is also consistent with Aristidou, Sam and Bennett 2008, who show that fructose can reduce acetate overflow and improve biomass/recombinant expression in E. coli, suggesting that fructose supports a more favorable metabolic state for bioproduction than glucose under these conditions.
2) Notes for design of a DNA construct for bioproduction
2.1) The first thing to do is to decide what organism you are going to use for this (E. coli or S. cerevisiae) for production. Which would you choose and why (emphases on production differences)?
I would choose E. coli. S. cerevisiae could be useful for more complex eukaryotic engineering or when compartmentalization and eukaryotic metabolism are advantageous, but for fast carotenoid pathway testing, I think E. coli is the more practical chassis.
Criterion
E. coli
S. cerevisiae
Growth speed
Very fast growth, useful for rapid testing
Slower growth compared to E. coli
Genetic engineering
Easy plasmid transformation and many standardized tools
Strong engineering tools, but usually more complex
Pathway prototyping
Well suited for quick testing of pathway designs
Better for longer-term strain engineering
Production context
Directly supported by the lab setup using pAC-LYC and pAC-BETA plasmids
Would require a different design strategy, usually genome integration
Metabolism
Good bacterial chassis for recombinant pathway expression
Useful when eukaryotic metabolism or compartmentalization matters
Literature support
The referenced papers directly use E. coli for fructose-based recombinant expression and lycopene production
Not the system tested in these papers
2.2) Now choose one of the enzymes and lets outline the parts of the construct for expression
I would choose the phytoene desaturase, encoded by crtI, because it catalyzes the conversion of phytoene into lycopene and may be one of the key pathway-level bottlenecks in lycopene production.
Construct part
Example / choice
Function
Promoter
Tunable inducible promoter, such as pBAD or lac-based promoter
Controls when and how strongly crtI is transcribed
Operator
Depends on promoter system
Allows regulation by an inducer or repressor
RBS
Bacterial ribosome binding site
Controls translation initiation and affects CrtI protein level
Coding sequence
crtI
Encodes phytoene desaturase, the enzyme that converts phytoene into lycopene
Terminator
Strong bacterial terminator
Stops transcription and prevents read-through
Origin of replication
Medium-copy origin
Allows plasmid replication while limiting metabolic burden
Antibiotic resistance marker
Chloramphenicol or another selectable marker
Allows selection of cells carrying the plasmid
A minimal plasmid design would be: Origin of replication, antibiotic resistance marker, promoter, operator, RBS, crtI, terminator.
If the goal were only to test crtI expression, this construct would be enough. But if the goal is full lycopene production, crtI would need to be expressed together with the upstream pathway genes crtE and crtB, because E. coli requires crtE, crtB, and crtI to synthesize lycopene. For beta-carotene production, crtY would also be included.
2.3.i.1.a.i) What is the function of a promoter? The promoter is the DNA region that initiates transcription of the gene of interest. It controls RNA polymerase binding and therefore strongly affects when, where, and how much of the target enzyme is produced. In bacteria, promoter recognition depends on bacterial RNA polymerase and sigma factors, so the promoter must be compatible with a prokaryotic host like E. coli. Source: Educational Resources > Molecular Biology Reference > Promoters.
2.3.i.1.a.ii) What types of promoters do we have? Promoters can be grouped by their expression behavior. Constitutive promoters are active continuously, inducible promoters are turned on or increased by a signal such as IPTG, lactose, arabinose, heat, or light, and repressible promoters are turned off or reduced in response to a signal or metabolite. Common bacteria promoter examples are included in the table below.
Promoter type
Description
Mechanism
Examples from Addgene
Constitutive
Active by default / continuously drives expression
RNA polymerase can initiate transcription without needing a specific induction signal
T7, Sp6. Note: T7 requires T7 RNA polymerase
Inducible
Expression increases or turns on after a signal/inducer
Either removes repression or activates transcription
Expression decreases or turns off in response to a signal/metabolite
A metabolite or co-repressor enables repression of transcription
trp promoter is repressed by tryptophan
2.3.i.1.a.iii) If we wanted to turn off the transcription of a gene in response to a metabolite, what type of promoter would be most useful? What if we wanted this to increase in the presence of the metabolite? To turn transcription off in response to a metabolite, I would use a repressible promoter, such as the trp promoter, where high tryptophan represses transcription. To increase transcription in response to a metabolite, I would use an inducible promoter, such as lac/IPTG or araBAD/arabinose, where the inducer activates expression or removes repression.
2.3.i.1.a.iv) Now choose one of the genes of the metabolic pathway previously described (Carotene/lycopene )and choose one enzyme to make an expression construct. What promoter could you use for this? Why did you choose it? I would choose crtI, which encodes phytoene desaturase, the enzyme that converts phytoene into lycopene. I chose this gene because this step is a good candidate for pathway-level control: if CrtI expression is too low, phytoene may accumulate and lycopene output may remain limited.
For the promoter, I would use a tunable inducible bacterial promoter, such as pBAD/araBAD or lac/IPTG. I would prefer pBAD/araBAD for an initial design because arabinose-inducible expression allows controlled activation of the gene. The reason I would not immediately use a strong constitutive promoter is that carotenoid production can create metabolic burden. The goal is not simply to express crtI as strongly as possible, but to tune expression and find the level that improves lycopene production without compromising cell growth.
Therefore, a minimal expression cassette would be: pBAD promoter, RBS, crtI, terminator.
In the full plasmid context: Origin of replication, antibiotic resistance marker, pBAD promoter, RBS, crtI, terminator.
3.1.i What is the origin of replication? The origin of replication, or ori, is the DNA sequence where plasmid replication begins. It allows the plasmid to copy itself inside the host cell and be maintained over generations. Together with its control elements, the ori is part of the plasmid replicon. Source: Adgene’s Article “Plasmids 101: Origin of Replication” available here.
3.1.ii What types of origin of replication do we have? Origins of replication differ by copy number, replication control, compatibility group, and host requirements. Copy number affects gene dosage and burden; replication control affects how tightly plasmid replication is regulated; compatibility group matters when using more than one plasmid; and host requirements determine whether the plasmid can replicate in a given strain. Here goes some examples from Adgene’s Article “Plasmids 101: Origin of Replication” available here.
Origin / replicon
Approx. copy number
Replication control
Compatibility group
Host/use note
pUC / pMB1 derivative
~500-700
Relaxed
A
High-copy E. coli plasmids; useful for DNA yield, but can create burden
pBR322 / pMB1
~15-20
Relaxed
A
Medium-copy E. coli plasmids; more balanced expression
ColE1
~15-20
Relaxed
A
Common E. coli cloning origin
p15A / pACYC
~10
Relaxed
B
Lower-copy origin; compatible with ColE1/pMB1 plasmids
pSC101
~5
Stringent
C
Low-copy origin; useful when stability/low burden matters
R6K
~15-20
Stringent
C
Requires pir gene for replication
CloDF13 / pCDF
~20-40
Relaxed
D
Medium-copy origin, useful in multi-plasmid systems
3.1.iii (Extra) What are compatibility groups? Compatibility groups describe whether two plasmids can be stably maintained in the same bacterial cell. Plasmids with the same or very similar replication/partitioning systems are usually incompatible because they compete for the same replication control machinery. Over time, one plasmid may be lost. This matters if we want to use more than one plasmid in the same E. coli strain: they should have compatible origins, meaning different incompatibility groups. For example, pMB1/ColE1-derived plasmids such as pUC, pBR322, pET, and pGEX are all in compatibility group A, so they should generally not be combined in the same cell. A p15A/pACYC plasmid, group B, could be combined with a ColE1/pMB1 plasmid more safely.
3.1.iv Now for the previously chosen promoter and gene what will be the best origin or replication? For a crtI expression plasmid in E. coli, I would choose a medium-copy origin rather than a very high-copy pUC-type origin. This should provide enough CrtI expression while reducing metabolic burden. If I combine this plasmid with another carotenoid-pathway plasmid, I would choose compatible origins, for example p15A with ColE1/pMB1-derived origins.
4. Elaborate further on other bioparts like RBS, terminators, operators you would use for a correct design and further bioproduction?
Element
Example for this construct
Function
Why it matters for bioproduction
Origin of replication (ori)
Medium-copy bacterial ori, such as pBR322/pMB1-derived ori or p15A
Chloramphenicol, ampicillin, or kanamycin resistance
Allows selection of cells carrying the plasmid
Ensures that the production strain maintains the construct
Promoter / regulatory region
pBAD/araBAD or lac/IPTG-based promoter
Initiates transcription and, if regulated, controls when expression turns on/off
Lets me tune crtI expression instead of forcing constant maximum production
Operator / response element
araBAD/AraC or lacO/LacI regulatory sites, if using a regulated promoter
Binding site for regulatory proteins
Enables inducible or repressible control. This is part of the promoter/regulatory region rather than always a separate independent part
RBS - Ribosome Binding Site**
Bacterial RBS upstream of crtI
Recruits the ribosome to initiate translation
Controls how much CrtI protein is made from the mRNA
Coding sequence
crtI
Encodes CrtI / phytoene desaturase
Produces the enzyme that converts phytoene into lycopene
Terminator
Strong bacterial transcription terminator
Stops transcription after the coding sequence
Prevents read-through into other plasmid regions and improves construct stability
Assembly junctions / cloning sites
Gibson overlaps or Golden Gate overhangs
Enable construction of the plasmid
Allow modular assembly and later swapping of promoters, RBSs, or pathway genes
Optional insulators / spacers
Neutral spacer sequences between parts
Reduce unwanted context effects between genetic parts
Can make expression more predictable
Optional reporter/control
GFP in a test cassette, or pigment output itself
Helps verify that expression is working
Useful for debugging promoter/RBS behavior before optimizing the full carotenoid pathway
I did not complete questions 5, 6, 7, and 8, as they were marked as extra-point questions. For this submission, I prioritized the mandatory All Students and Committed Listener sections.
Melanin-based light-recording bioink/biomaterial Designing a MelC2-Based Cell-Free Module for Programmable Melanin Bioink
Reframing pigmentation from static dyeing to a programmable chemical state evolution, enabling materials that encode environmental history
Important links:
Resource Link Final presentation slides CL Final Project Slide Deck Final pTwist_MelC2_T7_TXTL_6xHis construct Benchling Twist Order for my Final Project: MelC2_T7_TXTL_6xHis_expression_cassette Benchling and Twist (Nodes) Document Cell-free master mix plan - 8 planned reactions My Week 11 HW Documentation SECTION 1 - ABSTRACT Melanin is a chemically heterogeneous dark biopolymer known for broadband UV-visible optical absorption, photoprotective behavior, photothermal conversion, redox activity, and long-term optical stability. These properties make melanin a compelling biological route to functional color: a pigment chemistry that can absorb and dissipate radiation, preserve optical traces, buffer oxidative stress, and interface with biological or electronic systems. This project proposes controlling melanin-forming chemistry in a synthetic biology system to develop a programmable bioink for engineered biomaterials. The broader vision is to create materials that combine biosensing and functional response: recording environmental inputs such as light, ionizing radiation, or oxidative stress through measurable optical change, while also enabling properties such as UV or radiation protection, photothermal conversion, antioxidant behavior, and bioelectronic interfacing. Depending on concentration, matrix composition, and material format, this melanin-based bioink could be explored for responsive textiles, UV-protective coatings, architectural and design surfaces, tattoo-like dermal pigments, space-oriented materials, bioelectronic interfaces, and localized radioprotective biomaterials. To move toward this goal, this project aims to design a first genetic module that generates measurable melanin-like optical changes in a controlled cell-free system, then use it as a foundation for future integration into engineered biomaterials such as bacterial cellulose. The central hypothesis is that a codon-optimized Streptomyces antibioticus MelC2 tyrosinase construct can provide a tractable route toward cell-free melanin-like pigment formation, with output shaped by tyrosinase activity, substrate availability, copper cofactor loading, oxygen, pH, redox state, and polymerization chemistry. During HTGAA 2026, I designed a MelC2 expression cassette for TX-TL / E. coli use and designed a validation workflow.
Bacteriophage Engineering GROUP MEMBERS: Diogo Custodio; Flo Razoux; Katharine Kolin; Mariana Kanbe; Marisa Satsia.
PROJECT MAIN GOAL : Increase the stability of the L protein
GROUP PROPOSAL: We will use the same workflow than in previous HW (e.g. mutagenesis) but adapt it to specific aim(s) based on HW reading material of week 04 (e.g. shorten the L protein to make it not dependant on bacterial chaperone DnaJ anymore).
Melanin-based bioink for Light-Recording Materials My individual final project is based on melanin and related compounds in an engineered living material (ELM) as a color-responsive bio-ink. Among many other factors, oxidation state, precursor availability / intermediate reaction pathways likely shape tone and long-term stability and may be modulated using a genetic system, be it a bacterium, a synthetic minimal cell, etc.
Subsections of Projects
Individual Final Project
Melanin-based light-recording bioink/biomaterial
Designing a MelC2-Based Cell-Free Module for Programmable Melanin Bioink
Reframing pigmentation from static dyeing to a programmable chemical state evolution, enabling materials that encode environmental history
Melanin is a chemically heterogeneous dark biopolymer known for broadband UV-visible optical absorption, photoprotective behavior, photothermal conversion, redox activity, and long-term optical stability. These properties make melanin a compelling biological route to functional color: a pigment chemistry that can absorb and dissipate radiation, preserve optical traces, buffer oxidative stress, and interface with biological or electronic systems. This project proposes controlling melanin-forming chemistry in a synthetic biology system to develop a programmable bioink for engineered biomaterials. The broader vision is to create materials that combine biosensing and functional response: recording environmental inputs such as light, ionizing radiation, or oxidative stress through measurable optical change, while also enabling properties such as UV or radiation protection, photothermal conversion, antioxidant behavior, and bioelectronic interfacing. Depending on concentration, matrix composition, and material format, this melanin-based bioink could be explored for responsive textiles, UV-protective coatings, architectural and design surfaces, tattoo-like dermal pigments, space-oriented materials, bioelectronic interfaces, and localized radioprotective biomaterials. To move toward this goal, this project aims to design a first genetic module that generates measurable melanin-like optical changes in a controlled cell-free system, then use it as a foundation for future integration into engineered biomaterials such as bacterial cellulose. The central hypothesis is that a codon-optimized Streptomyces antibioticus MelC2 tyrosinase construct can provide a tractable route toward cell-free melanin-like pigment formation, with output shaped by tyrosinase activity, substrate availability, copper cofactor loading, oxygen, pH, redox state, and polymerization chemistry. During HTGAA 2026, I designed a MelC2 expression cassette for TX-TL / E. coli use and designed a validation workflow.
SECTION 2: PROJECT AIMS
Aim 1: Experimental Aim
Build and validate a first MelC2-based cell-free melanin module
The first aim of this project is to design a codon-optimized Streptomyces antibioticus MelC2 tyrosinase expression cassette for TX-TL / E. coli use and test whether it can generate measurable melanin-like optical changes in a controlled cell-free system. This aim uses DNA design, Benchling assembly, Twist synthesis, fluorescent protein controls, visible darkening, OD 400-500 nm absorbance, SDS-PAGE, and future LC-MS analysis to distinguish protein expression, enzymatic activity, pigment accumulation, and downstream oxidation chemistry.
Aim 2: Development Aim
Optimize the chemical and optical behavior of the melanin-forming system
After validating the first module, the next aim is to optimize the reaction conditions that shape pigment output, including L-tyrosine concentration, copper availability, pH buffering, oxygen exposure, magnesium, incubation time, and reporter choice. This aim will help determine whether the system can be tuned for stronger pigment formation, cleaner optical readouts, and more predictable color response before integration into a material matrix.
Aim 3: Visionary Aim
Develop programmable melanin bioinks for exposure-recording and functional biomaterials
The long-term aim is to integrate the optimized melanin-forming module into bacterial cellulose or other biomaterials to create bio-based surfaces that can both record environmental exposure and respond functionally. If successful, this could support responsive textiles, UV-protective coatings, design surfaces, tattoo-like dermal pigments, bioelectronic interfaces, space-oriented materials, and localized radioprotective biomaterials.
SECTION 3: BACKGROUND
3.1. Peer-reviewed research citations
Melanin is relevant to this project because its material properties extend beyond visible pigmentation. Menichetti et al. 2025 describe melanin photoprotection as a combination of broadband light extinction and antioxidant activity, supporting the idea that melanin-based materials could pair optical response with protection against light-induced damage. Dadachova and Casadevall 2009 further show that melanin changes how biological systems interact with ionizing radiation, with melanized fungi displaying radioprotective behavior and altered electronic properties under radiation exposure. Together, these studies support the central premise of this project: melanin can be treated not only as a pigment, but as a functional material chemistry for exposure-responsive systems.
This material potential has already been explored in several application directions relevant to the proposed bioink. In space-oriented materials, Cordero et al. 2025 showed that fungal melanin-polymer biocomposites exposed to low Earth orbit conditions had improved structural stability and radiation-shielding potential. In photothermal and bioelectronic materials, Yue and Zhao, 2021 review how melanin-like materials can convert absorbed optical energy into heat and support sensor or interface applications through redox activity and mixed ionic/electronic behavior. At the skin interface, Park et al. 2024 developed electroactive melanin tattoo inks using naturally derived melanin nanoparticles to reduce skin impedance, suggesting that melanin-based pigments may be useful for dermal bioelectronic interfaces as well as coloration.
The bioink and textile direction also has direct precedent. Walker et al. 2024 engineered cellulose-producing Komagataeibacter rhaeticus to express tyrosinase and grow self-pigmenting bacterial cellulose through melanin biosynthesis, showing that genetically encoded pigmentation can be integrated into a material-producing microbial platform. Ahn et al. 2021 produced melanin-like pigments microbially from caffeic acid and applied the pigment to cotton fabric dyeing, supporting the relevance of microbial melanin as a textile-compatible colorant.
These studies connect directly to this project’s direction, but also clarify its specific contribution: instead of starting with a finished textile, this work first builds a controlled MelC2-based cell-free module to make melanin-like optical output measurable, tunable, and chemically interpretable before later integration into bacterial cellulose or other biomaterial matrices.
3.2. Novelty and innovation
This project is innovative because it uses existing biological tools in a new material context: a MelC2 tyrosinase module is designed not only to produce pigment, but to generate a measurable and tunable optical output. The cell-free system makes this approach modular, allowing key variables such as copper loading, substrate availability, pH, oxygen, redox state, and polymerization conditions to be tested before introducing the system into more complex biomaterial matrices. This creates a controlled bridge between genetic design and material performance.
The project also challenges a common assumption in functional materials: that color, sensing, protection, and responsiveness must be added as separate components. Instead, it asks whether melanin-forming chemistry can be programmed as a single multifunctional layer that records exposure and produces useful material responses. In doing so, the project expands synthetic biology from making biological products toward engineering bio-based materials whose behavior can be designed, measured, and tuned.
3.3. Why the project matters and potential impact
The main ethical issue is not melanin itself, but the form in which the system is built and deployed. A melanin-based material can remain a controlled chemical module, become a non-replicating embedded system, or become part of a living material platform. Each design choice carries a different ethical burden, so the project should progress from the lowest-risk and most interpretable system toward more complex formats only after validation.
Design choice
Role in the project
Ethical implication
Cell-free MelC2 module
First experimental platform for testing pigment chemistry
Lowest deployment risk; controlled, non-replicating, and easiest to interpret
Non-replicating synthetic minimal cells
Possible future format for localized sensing or pigment production inside a material
Safer than living cells, but requires proof that encapsulation, stability, and output control work
Living bacterial cellulose platform
Possible future scaffold for material production and integration
Most powerful material format, but requires stronger containment, characterization, and environmental controls
For this reason, the current project takes the cell-free route as an ethical and technical starting point. It validates the core chemistry - MelC2 expression, copper loading, substrate availability, pH, oxygen, and pigment formation - before adding living-system or material-scale complexity. This avoids treating a speculative material concept as a deployable product too early.
Ethical principle
What it means here
Project response
Responsibility
Color change could be mistaken for a calibrated exposure sensor
Define whether the output is aesthetic color, qualitative exposure record, or quantitative biosensor
Non-maleficence
Protective claims could create false confidence if the material is not tested under real exposure conditions
Do not claim UV protection, radioprotection, dermal use, or biomedical function before direct validation
Beneficence
The project could reduce material complexity while adding useful functions
Prioritize applications where melanin adds clear value: exposure recording, photoprotection, photothermal response, or oxidative buffering
Biosafety / containment
Future versions may involve living or semi-living systems
Start cell-free; prefer non-replicating or purified systems before deployable living materials
The practical ethical strategy is staged development: first validate pigment chemistry, then test material integration, then evaluate sensing or protective performance under relevant conditions. The main risks are overclaiming protection, treating color change as quantitative sensing too early, or moving into dermal / biomedical contexts before the material is characterized. The project could also be wrong if melanin pigmentation does not correlate reliably with exposure, if pigment chemistry is too variable to control, or if a simpler non-biological sensor performs better. Alternatives such as purified enzymes, synthetic melanin-like polymers, or conventional exposure sensors should remain available if they prove safer or more reliable.
SECTION 4: EXPERIMENTAL DESIGN, TECHNIQUES, TOOLS, AND TECHNOLOGY
4.1 Experimental plan and timeline in 15 steps
The experimental plan follows a build-test-learn structure. First, the MelC2 construct is selected, designed, ordered, and validated in silico. Next, the cell-free TX-TL system is tested using fluorescent protein controls. Then, pigment formation, protein expression, and pathway chemistry are validated separately. After the course, the project can move toward light-controlled expression modeling and material integration for a melanin-based light-recording bio-ink.
Build an expression cassette for TX-TL / E. coli use
T7 promoter, RBS, spacer, codon-optimized melC2 CDS, C-terminal 6xHis tag, stop codon, and terminator. Final annotated MelC2 tyrosinase CDS is available in the same Benchling record.
3. Verify protein identity
Completed
BLASTP, conserved-domain analysis
Confirm that the optimized sequence still encodes a canonical tyrosinase
Top hits remain MelC2 / tyrosinase-like proteins with a conserved tyrosinase domain
MelC2 cassette submitted in pTwist Amp High Copy using the Benchling interface
5. Validate TX-TL expression capacity and screen initial reaction variables
8 first reactions planned here; execution estimated 2-4 days
Cell-free reactions with sfGFP and mScarlet-I fluorescent protein controls
Confirm that the cell-free system supports protein expression and establish baseline reaction conditions
Strong sfGFP or mScarlet-I signal would indicate that the TX-TL system is functional. The planned reaction matrix compares substrate, copper, pH buffering, magnesium, incubation time, and reporter choice.
6. Measure pigment kinetics
Planned; estimated 1-3 days
OD 400-500 nm absorbance
Quantify melanin-like pigment accumulation over time
Increasing absorbance would support pigment formation kinetics
7. Confirm MelC2 protein expression
Planned; estimated 2-3 days
SDS-PAGE / His-tag detection
Distinguish protein expression from pigment formation
A MelC2-sized band would support successful expression, even if pigment output is weak
8. Analyze pathway chemistry
Planned; estimated 1-2 weeks
LC-MS
Track L-tyrosine depletion and L-DOPA / quinone-related intermediates
Confirms enzymatic activity even when visual pigment output is ambiguous
9. Model future light control
Post-course; estimated 1-2 weeks for first modeling round
Asimov Kernel
Model a light-activated expression circuit that could later support gradual tonal change in a material system
Candidate circuit logic for controlling melanin expression in response to light exposure
10. Refine light-control model toward aesthetic and functional goals
Move from biochemical reaction module to material prototype
Spatially localized and stable optical change in a material scaffold
13. Compare material-system architectures
Post-course; iterative
Engineered living material design, bacterial cellulose scaffold testing, hybrid BC / cell-free module systems
Compare different ways of integrating the melanin-producing module into a functional material
Identification of the most promising integration model: engineered K. rhaeticus, bacterial cellulose scaffold with embedded cell-free modules, or a hybrid system
Move from simple expression and pigment checks to more refined mechanistic and material-level validation
A staged validation framework for developing a melanin-based light-recording bio-ink
After the construct is designed, validation proceeds from simple functional checks to more specific analytical readouts. Each step is meant to isolate a different possible failure point: TX-TL expression capacity, MelC2 protein production, pigment accumulation, or pathway-level chemistry.
Previous iGEM tyrosinase projects (see references list at the end of this document) showed that tyrosinase expression can be detectable even when pigment formation is weak or absent. For this reason, protein production, enzyme activity, and optical output need to be validated separately rather than treated as a single result.
Post-course, the next conceptual step is to move this melanin-based light-recording bio-ink forward by modeling light-activated melanin expression in Asimov Kernel. This will help clarify whether the project should first push for tighter molecular control, or move earlier into material-scale experimentation. The longer-term goal is to connect the cell-free MelC2 module to a material system capable of controlled, spatially localized, and visually meaningful pigment formation.
The tables below summarize the validation logic for the cell-free MelC2 module. Each readout acts as a checkpoint, moving from general TX-TL expression capacity to visible pigment output, absorbance kinetics, protein expression, and finally chemical validation.
Step
Method
Question answered
Expected result
Decision
1
Fluorescent protein control
Is the TX-TL system functional?
Strong sfGFP or mScarlet-I fluorescence
If weak, debug TX-TL before testing MelC2
2
Reaction photos
Is there visible darkening over time?
Progressive color change in reaction samples
If absent, continue to OD because pigment may be low-level
3
OD 400-500 nm
Is pigment accumulating quantitatively?
Absorbance increases over time
If flat, check MelC2 protein expression
4
SDS-PAGE / His-tag detection
Is MelC2 expressed?
Band near the expected MelC2 size or His-tag signal
If absent, debug construct, expression conditions, or protein stability
5
LC-MS
Is the pathway chemically active?
L-tyrosine depletion and/or detection of L-DOPA / quinone-related intermediates
If intermediates are absent, investigate folding, copper incorporation, pH, oxygen, substrate availability, or sampling time
Observation
Interpretation
MelC2 detected + OD increase / darkening
Best-case result: protein expression and pigment-forming chemistry are both working
MelC2 detected + no OD increase / darkening
Expression works, but enzyme activity, cofactor availability, substrate availability, oxygen, or downstream pigment chemistry may be limiting
No MelC2 detected + no pigment
Expression or construct-level failure
LC-MS intermediates + weak pigment
Enzyme is active, but pigment polymerization or pigment accumulation is limiting
No LC-MS intermediates + no pigment
MelC2 is inactive, absent, or missing required catalytic conditions
This staged logic keeps the first aim experimentally interpretable. Color change is treated as one readout among several, while protein expression and LC-MS provide the controls needed to distinguish enzyme production, catalytic activity, and downstream pigment chemistry.
An initial version of a visual workflow diagram for this validation logic generated using chatGPT can be found in my Brainstorms documentation.
4.2 Techniques relevant to this project
The checked techniques reflect the parts of the project that were used or directly planned, from MelC2 construct design and cell-free expression to validation readouts and future automated testing.
Pipetting
Pipetting
Lab Safety
Bioethical Considerations
Pipetting, lab safety, and bioethical considerations are relevant because the project depends on careful preparation of small-volume cell-free reactions and controlled handling of reagents such as L-tyrosine, CuSO4, buffers, and DNA constructs. These techniques also support the project’s staged design: testing melanin-forming chemistry in a contained, non-deployable system before moving toward biomaterial applications (lab safety and bioethical consideration).
DNA Gel Art
DNA Sequencing
DNA Editing
DNA Construct Design
Restriction Enzyme Digestion
Gel Electrophoresis
DNA Purification From Gel
Databases, e.g. GenBank, NCBI, Ensembl, and UCSC Genome Browser
DNA Gel Art / construct design was selected because the project required designing and verifying a MelC2 expression cassette. DNA construct design and databases are checked because Benchling, UniProt, BLASTP, and sequence databases were used to select, optimize, and verify the MelC2 construct. DNA sequencing, DNA editing, restriction digestion, gel electrophoresis, and gel purification are unchecked because they were not performed in this stage. However, they may become relevant after synthesis if the construct needs to be sequence-verified, edited, digested, visualized on a gel, or purified before downstream expression tests.
Bioproduction
Bioproduction
Chassis Selection, e.g. TX-TL / E. coli context
Registry of Standard Biological Parts
Plasmid Preparation
Bacterial Culturing
Quality Control / Analysis
Bacterial Processing, e.g. centrifugation, lysis, DNA purification
Bioproduction was selected because the project aims to produce a functional biological output: MelC2-driven melanin-like pigmentation. Chassis selection is checked because the first expression context is TX-TL / E. coli, and the Registry of Standard Biological Parts informed the expression design. Quality control / analysis is checked because the workflow uses fluorescence, OD 400-500 nm, SDS-PAGE, and future LC-MS to validate expression, activity, and pigment output. Plasmid preparation, bacterial culturing, and bacterial processing are unchecked because this stage uses a synthesized construct and cell-free validation rather than live-cell propagation or processing.
Lab Automation
Creating Code for Laboratory Automation
Using Liquid Handling Robots, e.g. Opentrons
Designing a Twist Order
Creating a plan to use the Autonomous Lab at Ginkgo Bioworks
Lab automation was selected because the project includes automated experimental planning for the next validation stage. I checked code for laboratory automation, Twist order design, and Ginkgo Bioworks planning because I prepared the construct for synthesis and began designing a reaction matrix to test variables such as copper, tyrosine, buffering, magnesium, and reporter choice. I left liquid handling robots unchecked because I did not directly operate an Opentrons or similar robot in this stage.
Protein Design
Protein Design
Use of Boltz or PepMLM
Use of Asimov Kernel
Use of Benchling
Models and Notebooks
Databases
Protein design was selected because the project depends on choosing, analyzing, and eventually controlling a melanin-forming enzyme. I checked Benchling, databases, models, and notebooks because they were used to select MelC2, inspect sequence/function, support construct design, and analyze protein behavior. Asimov Kernel is checked because it was explored for future light-responsive control of MelC2 expression. Boltz and PepMLM are unchecked because they were not used in this stage.
Cell-Free Systems
Cell-Free Reactions
Freeze-Dried Cell Free Systems
miniPCR Tools
Protein Purification
Cell-free systems were selected because the first experimental goal is to test MelC2 expression and melanin-like pigment formation in a controlled, non-replicating format. Cell-free reactions and freeze-dried cell-free systems are checked because they are the planned platform for validating the module. miniPCR and protein purification are unchecked because this stage does not require PCR amplification or purified MelC2 protein.
Gibson Assembly
Primer Design or Selection
PCR Reactions
Gibson Assembly
Other Cloning Methods, e.g. Restriction Enzyme Digestion or Gateway Cloning
CRISPR
CRISPR/Cas9
Designing Prime Editing gRNA
Gibson Assembly and CRISPR were left unchecked because the current project does not involve cloning by PCR/Gibson methods or genome editing. The MelC2 module was designed digitally and prepared for synthesis through Twist, so primer design, PCR, Gibson Assembly, restriction-based cloning, CRISPR/Cas9, and prime-editing gRNA design were not part of this stage.
4.3 Two techniques expanded
The two most important techniques are DNA construct design, which creates the module, and cell-free reactions, which test whether the module produces an interpretable optical output.
DNA construct design: DNA construct design is central because Aim 1 depends on building a MelC2-based module that can be tested in TX-TL / E. coli conditions. I used database research to select MelC2, codon optimization to adapt the sequence for expression, and Benchling to assemble the cassette. The C-terminal 6xHis tag supports future protein-level validation. This matters because MelC2 expression must be distinguished from actual melanin-like pigment formation. Cell-free reactions: Cell-free TX-TL is the cleanest first platform because the main uncertainty is chemical: whether MelC2 expression, copper loading, L-tyrosine availability, pH, oxygen, and downstream oxidation chemistry can generate optical change. Compared with living cells, the system is easier to control and interpret. It also avoids adding material-scaffold complexity too early. The first experiments use fluorescent protein controls, visible darkening, OD 400-500 nm absorbance, SDS-PAGE, and future LC-MS to validate the module step by step.
4.4 Industry Council companies relevant to the project
These companies are relevant because they map onto the main project needs: DNA synthesis, automation, modeling, chemical analysis, reagents, and future biomaterial translation.
Company
Relevance to project
Twist Biosciences
DNA synthesis for the MelC2 expression cassette
Ginkgo Bioworks
Autonomous cell-free reaction testing and experimental automation
Asimov / Kernel
Future modeling of light-responsive genetic control
Waters Corporation
LC-MS analysis of L-tyrosine, L-DOPA, and oxidation intermediates
Millipore Sigma
Reagents such as L-tyrosine, CuSO4, buffers, and analytical standards
Thermo Fisher Scientific
Molecular biology reagents, protein analysis tools, and general lab workflows
BioFabricate
Future biomaterial, textile, and design-oriented applications
Cultivarium
Potential relevance for future non-model organism or biomaterial chassis engineering
A particularly relevant external benchmark is MelaTech, a startup focused on melanin-based materials for space applications.
SECTION 5: Results & Quantitative Expectations
5.1.1 Aspect of the final project validated
I validated the DNA design foundation of the project: the construction of a MelC2 tyrosinase expression cassette for TX-TL / E. coli use. This validation addresses the first build layer of the project, because a reliable genetic module is required before testing melanin-like pigment formation in a cell-free system.
The validated output is not melanin production itself, but a synthesis-ready construct designed to express a soluble, oxygen- and copper-dependent tyrosinase. This includes enzyme selection, codon optimization, expression cassette design, vector assembly in Benchling, Twist submission, and a planned validation workflow for future cell-free testing.
5.1.2 Validation protocol
The figure below, from my CL Final Project presentation, summarizes the project pipeline around the validated build layer: MelC2 selection, DNA construct design, Twist submission, and planned cell-free validation.
I followed this protocol:
5.1.2.1 Enzyme selection
I selected MelC2 from Streptomyces antibioticus as the target enzyme for the first construct. I chose MelC2 because it is a soluble, cytosolic, oxygen-, copper-dependent and reasonably small enzyme of about 273 amino acids, and has a reviewed Swiss-Prot annotation, which made it a strong first candidate for cell-free TX-TL expression.
Its dependence on copper, oxygen, substrate availability, pH, and downstream polymerization also gives the project a clear set of tunable variables for validating melanin-like pigment formation.
And here is the predicted structural model used as part of my design context:
5.1.2.2 Codon optimization
I codon-optimized the MelC2 sequence for E. coli K-12 expression in Benchling**
Codon-optimization of P07524 for E. coli K-12, to avoid BsaI/BsmBI/BbsI and add a C-terminal His-tag to quantify enzyme expression cleanly -> Results in Benchling here.
I’ve selected the region of the AA sequence I wish to back translate and right clicked on the highlighted region. From the the codon optimization tab:
Host: E. coli K-12
Method: Match codon usage
GC content: Medium (0.33 to 0.66) cause the extremes may be inconvenient. High GC can create strong secondary structures and low GC can cause instability/repeats and can make synthesis harder.
Uridine depletion: off (not relevant for bacterial expression)
Hairpin parameters: Stem size: 8 and Window 50
Restriction sites: avoid BsaI, BsmBI, BbsI (Type IIS restriction enzymes, the workhorses of Golden Gate assembly)
Patterns to reduce: AAAAAA and ATATATATA
I clicked on “Preview Optimization” and got this result, which I’ve saved in the same Benchling folder here:
BLASTP verification of codon-optimized sequence:
I translated the codon-optimized DNA and ran BLASTP against nr/ClusteredNR. The top hits were MelC2 tyrosinases from Streptomyces spp., with 100% query coverage, E-value 0.0, 92% identity (251/273), 95% positives (261/273), and 0 gaps. Conserved domain analysis identified the Tyrosinase domain across the full sequence length. This confirms the optimized DNA still encodes a canonical tyrosinase.
melC2 tyrosinase (Streptomyces antibioticus, P07524, codon-optimized for E. coli K-12) DNA sequence Benckling link here.
5.1.2.3 Protein-detection design
I added a C-terminal 6xHis tag (CACCACCACCACCACCAC) before the stop codon to support future protein-level detection / quantification.
5.1.2.4 Expression cassette assembly
I assembled the TX-TL expression cassette using a T7 Promoter, RBS (Shine Delgarno) / AAATAT Spacer, codon-optimized melC2 CDS, C-terminal 6xHis tag, TAA stop codon, and T7 terminator BBa_B0015 Benchling link here.
To be considered: T7 can maximize protein yield but also overwhelm folding capacity, causing inactive protein accumulation (increase the likelihood of tyrosinases misfolds, aggregation, or fail to incorporate copper correctly). I’d replace it by a moderated construct and compare the results in reference to the BBa_K2481108 (control).
I placed the full expression cassette into a pTwist Amp High Copy vector. Why: high-copy propagation in E. coli for easy plasmid prep; selection marker is standard.**
I inspected the final construct map in Benchling to confirm the organization of the insert, vector, promoter, terminator, and annotated CDS. Assemblings on Benchling here.
Final Construct: My melC2 construct submitted assembled into pTwist Amp High Copy on Benchling interface.
5.1.2.6 Twist submission
I submitted the final construct for synthesis through Twist. My Twist order for final Construct here
This completed the validated DNA-design layer of the project.
Planned next validation
After synthesis, the construct will be tested in a staged cell-free workflow: fluorescent protein controls for TX-TL capacity, visible darkening and OD 400-500 nm for pigment formation, SDS-PAGE / His-tag detection for MelC2 expression, and future LC-MS for tyrosine / L-DOPA-related intermediates.
I also began planning Ginkgo RAC-style cell-free reaction conditions to test key bottlenecks such as copper availability, tyrosine concentration, pH buffering, magnesium, incubation time, and reporter choice.
Here are some variables I had in mind when formulating this first 8 master mix compostion
Melanin production in E. coli or in a cell-free system is influenced by several parameters that act at the level of melC2 expression and enzyme activity / downstream reactions:
CuSO4 concentration: since this tyrosinase is a type 3 copper-containing enzyme, Cu2+ is a cofactor of the enzyme. Too much copper can also stress cells or inhibit cell-free reactions.
Magnesium
Energy mix
Molecular oxigen avaliability for tyrosinase reactions
pH: tyrosinase activity and melanin polymerization are pH-dependent. If the reaction acidifies over time, enzyme activity or pigment formation may decrease.
My first 8 experiments at Ginkgo - aim is to successfully produce fluorescent protein and generate an initial dataset for analysis.
sfGFP → system calibration (TX-TL health)
Melanin has a broad absorbance spectrum, but it absorbs much more strongly at shorter wavelengths (blue/green) than at longer wavelengths (red). Melanin interferes with optical readout since we will be trying to measure fluorescence in a reaction that is simultaneously getting darker, which creates optical interference broadening the wavelength spectrum of signal.
mScarlet-I → expression readout for melC2 tyrosinase specifically
fluorescence is less sensitive to melanin, so it better tracks expression alone (sfGFP → Ex ~488 nm / Em ~510 nm → high overlap with melanin absorbance; mTurquoise2 → even worse (blue region); mScarlet-I → Ex ~569 nm / Em ~594 nm → less overlap).
For optimizing the Master Mix design for mScarlet-I in my melC2 tyrosinase cell-free system, I’d supplement CuSO4 since my analyte is a copper-dependent enzyme, HEPES-KOH pH 7.5 to have an additional buffer against acidification and magnesium glutamate to improve translation capacity.
At first I thought about adding glucose since it could extend energy regeneration, but then I wondered that it may also increase acidification. Since you’re worried about fluorescence readout in a pigment-producing system, I’d prioritize pH stability over extra glucose.
I’d actually supplement L-tyrosine that serves as a functional validation that my protein of interest MelC2 tyrosinase is being expressed and active.
Master Mix designs to be tested using mScarlet-I and sfGFP, the 8 reactions outlined are available here in Week 11 HW Documentation.
5.1.3 Synthetic biology techniques used
The main synthetic biology technique used was DNA construct design. I designed a codon-optimized MelC2 tyrosinase cassette for TX-TL / E. coli expression, added a C-terminal 6xHis tag for future protein detection, assembled the cassette in Benchling, and prepared it for synthesis through Twist.
I also used database-based sequence selection and verification. UniProt and Benchling were used to select and inspect the MelC2 sequence, while BLASTP and conserved-domain analysis were used to confirm that the codon-optimized DNA still encoded a canonical tyrosinase.
A third relevant technique was cell-free system planning. The construct was designed specifically for TX-TL / E. coli use, and the next validation workflow was planned around fluorescent protein controls, visible darkening, OD 400-500 nm absorbance, SDS-PAGE / His-tag detection, and future LC-MS analysis.
Finally, I used lab automation planning by preparing a Ginkgo RAC-style reaction matrix to test variables expected to affect MelC2 pigment formation, including copper availability, L-tyrosine concentration, pH buffering, magnesium, incubation time, and reporter choice.
5.1.4 Data and analysis
The validation data for this stage are design-level and sequence-level results generated during construct preparation. These data show that the MelC2 construct is synthesis-ready and still encodes the intended tyrosinase target.
Validation item
Result / quantitative expectation
Interpretation
Target enzyme
MelC2 tyrosinase from Streptomyces antibioticus
Selected as the first melanin-forming enzyme candidate
Protein length
~273 amino acids
Small enough for practical TX-TL expression testing
Codon optimization host
E. coli K-12
Matches the intended TX-TL / E. coli expression context
GC content after optimization
57%
Within a workable synthesis and expression range
Rare codons
6
Low enough to support expression feasibility
Hairpins detected
0
Reduces risk of problematic RNA secondary structure
AAAAAA occurrences
0
Removes a problematic repetitive A-rich pattern
ATATATATA occurrences
0
Removes a problematic repetitive AT-rich pattern
Avoided restriction sites
BsaI, BsmBI, BbsI
Improves compatibility with future Type IIS cloning workflows
Detection feature
C-terminal 6xHis tag
Enables future protein-level validation
BLASTP query coverage
100%
Optimized sequence still aligns across the full tyrosinase sequence
BLASTP E-value
0.0
Strong sequence-level match
BLASTP identity / positives
92% identity / 95% positives
Confirms the optimized construct still encodes a MelC2-like tyrosinase
Gaps
0
No major sequence disruption introduced by optimization
Conserved domain
Tyrosinase domain across full sequence
Confirms the intended enzyme family was preserved
These data validate the first build layer of the project: the DNA module is codon-optimized, annotated, compatible with the intended TX-TL / E. coli context, and submitted for synthesis. The results do not prove melanin production, but they confirm that the construct is coherent enough to justify downstream expression testing. The next quantitative expectation is that successful cell-free expression should produce detectable MelC2 by SDS-PAGE / His-tag detection and measurable pigment accumulation by OD 400-500 nm if the enzyme is active under the tested conditions.
5.2 Unexpected challenges, limitations, and alternatives
The main limitation is that a correct DNA construct does not automatically prove protein activity or melanin-like pigment formation. Tyrosinase expression can be detected while pigment remains absent if folding, copper incorporation, substrate availability, oxygen, pH, or downstream polymerization chemistry is limiting.
Another challenge is that melanin is a chemically heterogeneous output, so visible darkening alone is not enough to validate the system. To address this, the next validation stage separates TX-TL expression capacity, MelC2 protein production, pigment accumulation, and pathway-level chemistry using fluorescence controls, OD 400-500 nm, SDS-PAGE / His-tag detection, and future LC-MS. If the T7 design produces inactive or misfolded protein, an alternative strategy would be to test a moderated promoter, adjust copper and substrate concentrations, or compare purified enzyme / synthetic melanin-like polymer approaches before moving into more complex biomaterial systems.
This budget estimates the next practical stage of the project: validating the MelC2 construct in a cell-free TX-TL system before moving into material integration.
The cost ranges below were estimated with the assistance of ChatGPT and should be treated as approximate planning values. The estimation method was to break the project into major experimental cost categories - DNA synthesis, TX-TL reactions, controls, reagents, consumables, protein validation, chemical validation, and material integration - and assign conservative low/high ranges for each category based on typical small-scale synthetic biology workflows.
The lower end of each range assumes access to shared lab equipment, existing stocks of common reagents, and limited reaction numbers. The higher end assumes new reagent purchases, larger reaction matrices, external analytical services, or the need to purchase or arrange access to readout equipment. Exact costs would need to be confirmed through vendor quotes, institutional core facility pricing, or cloud-lab pricing.
Category
Supplies / services
Estimated cost
Notes
DNA synthesis
MelC2 TX-TL expression cassette in pTwist Amp High Copy vector
$150-300
One synthesis-ready expression cassette
Cell-free TX-TL reaction system
E. coli TX-TL master mix or freeze-dried cell-free reaction kit
$300-800
Enough material for expression controls and an initial MelC2 reaction matrix
DNA / expression controls
sfGFP control plasmid or template; mScarlet-I control plasmid or template
$100-300
Used to confirm that the TX-TL system supports protein expression
Substrates and cofactors
L-tyrosine; CuSO4; magnesium glutamate; nuclease-free water
$100-250
Core reaction components for testing tyrosinase activity
Buffering and reaction-condition reagents
HEPES-KOH pH 7.5; additional salts or energy-mix supplements if needed
$100-250
Used to adjust pH, magnesium, and reaction stability
Consumables
PCR tubes or reaction tubes; pipette tips; microcentrifuge tubes; plate or strip-tube format for reaction imaging
$100-250
Disposable materials for small-volume reactions
Optical readout equipment
Plate reader or spectrophotometer capable of OD 400-500 nm; fluorescence readout for sfGFP / mScarlet-I
$0 if shared; $5,000+ if purchased
The project requires access to the instrument, not necessarily purchase
Protein-expression validation
SDS-PAGE gel system access; protein ladder; gel stains; optional His-tag detection reagents
$150-500
Confirms whether MelC2 protein is produced independently of pigment output
Chemical validation
LC-MS access for L-tyrosine, L-DOPA, and related intermediates; analytical standards
$300-1,500
Cost depends on shared facility access, outsourcing, and sample number
Automation / cloud lab testing
Ginkgo RAC-style cell-free reaction matrix, if available
Variable
Not included in the main estimate because pricing depends on platform access
Estimated total for first validation stage: approximately $850-3,600, assuming access to shared lab equipment.
This total excludes major equipment purchases and uncertain cloud-lab pricing. It includes the core experimental costs needed to move from a designed MelC2 construct to initial TX-TL expression, pigment-production screening, and basic validation.
Estimated total including purchased equipment or external analytical services: could exceed $5,000-10,000, depending on instrument access, number of samples, and whether LC-MS or optical readout must be outsourced or purchased.
This documentation was developed with the assistance of ChatGPT, which was used to support drafting, editing, organization, and figure generation. All scientific decisions, final content, and interpretations were reviewed and approved by the author.
Please check our most recent updated Google Docs on this.
Note on project status
The Group Final Project became optional for Spring 2026, with collaborative work expected to resume later. Because of this, my documentation focuses on the individual contribution I made during the planning and design phase rather than on a completed experimental workflow.
My main contribution was to help define candidate MS2 L-protein mutations using a combination of protein language model scoring, experimental mutant data, and biological reasoning about L-protein functional regions. The goal was to identify a small set of interpretable mutations that could later be tested experimentally for effects on L-protein stability, DnaJ dependence, membrane insertion, and lysis function.
Here’s a summary of my main individual contributions to the plan for engineering the bacteriophage:
I ran the provided mutational scoring notebook to obtain per-substitution LLR scores for the MS2 L-protein and shortlisted substitutions with positive scores. The full scoring results are included in a table on my Homework 5 page.
I then cross-checked these shortlisted mutations against the provided experimental mutant dataset, L-Protein Mutants, which reports amino acid substitutions and their measured lysis phenotypes.
The overlap between the two data suggests that sequence-based LLR scores capture only part of the functional landscape of the MS2 L-protein. More broadly, positive LLR scores may reflect sequence plausibility or local biochemical compatibility, but they do not fully account for higher-order constraints such as host-factor dependence, membrane behavior, and oligomer formation.
Therefore, I decided to select five candidate mutations by combining positive LLR scores with biological reasoning about the protein’s distinct functional domains, treating LLR scores as a prioritization tool for experimental testing rather than as a direct predictor of lytic function.
The MS2 L-protein is organized into distinct functional domains:
Hydrophilic N-terminal region involved in DnaJ-mediated folding
Transmembrane/C-terminal region responsible for membrane insertion and pore formation
The two soluble-region mutants, S9Q and C29R, were chosen to probe effects on folding and possible DnaJ dependence, whereas the three transmembrane mutants, A45L, T52L, and N53L, were chosen to probe membrane insertion and oligomerization.
Selection Rationale: High positive score in the soluble region (putative DnaJ-interaction domain). Ser→Gln increases hydrogen-bonding potential and may alter surface chemistry without strongly destabilizing the fold.
Selection Rationale: One of the strongest positive-scoring substitutions in the soluble region. Adds a positive charge that could reshape chaperone-recognition or interaction surfaces.
Selection Rationale: Hydrophobic substitution in the transmembrane segment. Ala→Leu increases hydrophobicity and may stabilize membrane helix packing/insertion and oligomer stability.
Selection Rationale: Polar→hydrophobic change in the TM region. Thr→Leu may increase membrane compatibility and reduce local insertion/misfolding penalties.
Selection Rationale: Polar→hydrophobic change in the TM region with a strong positive score. Selected as an additional TM-stabilizing candidate.
Brainstorms
Melanin-based bioink for Light-Recording Materials
My individual final project is based on melanin and related compounds in an engineered living material (ELM) as a color-responsive bio-ink. Among many other factors, oxidation state, precursor availability / intermediate reaction pathways likely shape tone and long-term stability and may be modulated using a genetic system, be it a bacterium, a synthetic minimal cell, etc.
Melanin itself is a heterogeneous and hard-to-define analyte candidate, so my idea is to use its main defined intermediates, like L-DOPA, dopamine, and quinones, as analytes and use a high-resolution method like LC-MS for calibration/ground truth method aiming to understand and quantify melanin-related compounds that interfere in the darketing output of the ink/material. Than use protein design to build embedded sensing for spatial or real-time readouts inside the material aiming for building a fine-tuning system that can relate color tone of the material and the synthesis of the different melanin compounds as well as control mechanisms that can trigger it (different UV light wavelengths for instance).
Explore whether melanin-based optical outputs can be generated within different bio-materials such as bacterial cellulose (BC) and ELMs it for applications in fashion, design, and light-recording materials.
I want to establish a first melanin-producing genetic platform, and fine tune it’s pigmentation in a high resolution scale. The strongest version of the project, a bio-based material that gradually develops melanin-derived tonal variation in response to different input signals (i.e. different UV wavelenghts), behaving less like a dyed textile and more like an exposure-recording surface.
Since K. rhaeticus naturally produces cellulose, it also lets me focus on material-producing biology in a native chassis instead of forcing cellulose synthesis into a non-native organism. On top of that, I am interested in the possibility of later embedding synthetic minimal cells into the cellulose as localized, non-growing modules for sensing and pigment generation.
A major question for me is what the right analyte is. Since melanin is a heterogeneous polymer, I think it does not make sense to treat it as a single clean measurable output. Because of that, I am leaning toward focusing on using as analyte more tractable analytes such as the expressed enzyme itself, or melanin-related intermediates like L-tyrosine, L-DOPA, dopamine, quinones, DHI, or DHICA.
This is where LC-MS starts to feel really central to the project. I started thinking that maybe the application should be chosen based on what LC-MS is actually powerful enough to resolve. That led me to think about applications where fine control over color, stability, or chemical state is especially important:
Bio-based inks or photography, where oxidation state could shape color and long-term stability.
The ink and photography direction is especially interesting to me because the final image might look stable, but what defines tone and durability may actually be determined much earlier by oxidation chemistry.
Two materials could look similar at first, but age very differently depending on how those intermediates evolved. In that case, LC-MS could help connect invisible intermediate chemistry to visible outcomes in the final material.
Bioadhesives or coatings, where intermediate catechol chemistry may directly determine performance.
The bioadhesive or catechol-based coating direction also seems compelling. These systems often depend on catechol-containing molecules like dopamine or L-DOPA, which can oxidize into quinones and then participate in crosslinking. That balance between reduced catechol and oxidized quinone seems to shape adhesive behavior. So instead of only testing the final strength of an adhesive, LC-MS could potentially help track how the chemistry develops during formation and explain why some conditions produce better performance than others.
In these kinds of systems, LC-MS and fine tune control of synthesis of melanin-compounds does not feel like overkill to me. It feels like the right level of resolution for the chemistry that actually matters. So I am starting to think about the project less as “make a melanin material” in the broadest sense, and more as “choose a melanin-related material application where intermediate-state chemistry is central, measurable, and worth controlling.”
Project concept:
An engineered living material (ELM) based on bacterial cellulose (BC), using Komagataeibacter rhaeticus as the primary chassis, to produce melanin-based optical outputs in a cellulose material for fashion, design, and light-recording applications.
The current direction is not to maximize “smart material” complexity at once, but to first establish a robust melanin-producing BC platform, then evaluate whether additional functions such as keratin expression, self-repair, or embedded synthetic minimal cells are technically justified.
The strongest version of the project is a nude-toned or skin-adjacent material that gradually develops melanin-derived tonal variation in response to exposure conditions, producing a material that behaves less like a dyed textile and more like an exposure-recording surface.
Why bacterial cellulose?
BC is a strong candidate because it is:
biogenic and directly fabricable as a sheet-like material
compatible with engineered living material approaches
mechanically robust relative to many other microbial matrices
moldable as pellicles, spheroids, or printed structures
already supported by the Komagataeibacter Tool Kit (KTK), a modular cloning toolkit for this genus
In carbon-rich media, Komagataeibacter polymerizes and secretes linear glucose chains that self-assemble into a dense interconnected cellulose mesh. This cellulose pellicle forms at the air-liquid interface and behaves like a biofilm-like material scaffold around the producing cells.
Which chassis?
Primary chassis: Komagataeibacter rhaeticus
A high-yield bacterial cellulose producer and a strong chassis for BC-based ELMs.
Why Komagataeibacter rhaeticus?
native bacterial cellulose production
established relevance for BC-based material engineering
allows the project to focus on more specific objectives for material-producing biology, rather than forcing cellulose synthesis into a non-native organism like E. coli
Secondary system: synthetic minimal cells embedded in BC
As a second aim, the project may incorporate synthetic minimal cells (SMCs) as embedded, non-replicating functional modules inside or on the cellulose material. As these SMCs would add localized, compartmentalized sensing and pigment-generation functions to the BC scaffold. Therefore, a useful synthetic minimal cell for this project would basically be a light-exposure logging vesicle embedded in or deposited onto bacterial cellulose.
The living BC producer: K. rhaeticus builds the material scaffold and the synthetic minimal cells allow vesicle-based modules provide controlled, non-growing sensing and melanin output. This separation may be useful if pigment production or sensing logic is easier to implement in a compartmentalized cell-free system than in the BC-producing chassis itself.
Main questions
1- Since melanin is a heterogeneous polymer, which analyte should I choose to analyse?
I might want to confirm the expressed enzyme/protein (for example tyrosinase, laccase, TyrP, or another melanin-related enzyme) or melanin intermediates: L-tyrosine, L-DOPA, dopaquinone-derived products, DHICA, DHI, etc since melanin is a heterogeneous polymer.
so
These are often much more tractable by LC-MS than melanin itself.
Other questions
Nutrient availability: If the final material remains living, nutrient supply becomes a major constraint.
Biosafety: use of non-replicating synthetic minimal cells
Aims
AIM 1: Define and model a first light-responsive melanin-producing synthetic minimal cell for integration into bacterial cellulose
Develop a specific in silico design for a phospholipid vesicle-based synthetic minimal cell that uses EL222 to activate melA expression under blue light, with the goal of generating visible melanin production as a localized output that could later be embedded into bacterial cellulose made by K. rhaeticus. This aim focuses on specifying the exact first system, its required components, and whether its chemistry and logic are feasible before any experimental implementation.
AIM 1 Specific Objectives:
define the exact genetic module to be tested first: EL222 + melA
specify the full internal composition of the vesicle:
Tx/Tl source
ATP regeneration system
tyrosine
copper
salts/cofactors
define the membrane composition for the first prototype, e.g. POPC + cholesterol
map the input-output logic precisely:
input = blue light
regulator activation = EL222
output = tyrosinase expression
final material output = melanin accumulation / darkening
determine which molecules must be pre-encapsulated and which, if any, must cross the membrane
identify the minimum set of assumptions required for the system to function = specify the required materials, genes, lipids, cofactors, and readouts for the first prototype
AIM 2: Experimental planning and prototyping strategy for melanin integration into bacterial cellulose materials
Translate the selected design into a concrete experimental plan, prioritizing a staged workflow from simple proof of concept to material-level testing. This aim is not yet full implementation, but the preparation of a robust experimental roadmap that makes the project technically executable and testable.
Practical objectives:
measures of success / failure:
define the first measurable success criteria: visible darkening? absorbance increase? spatially localized pigment formation?
identify the main failure points of this exact design, such as insufficient expression, low tyrosinase activity, substrate limitation, or poor melanin accumulation
define the first build-test sequence, including which subsystem should be validated first:
melanin pathway in a tractable chassis
cell-free context
BC production in K. rhaeticus
integration of pigment module with BC
plan how BC will be fabricated and presented for testing, e.g. pellicles, spheroids, molded sheets, or layered composites
define how synthetic minimal cells would be embedded in, coated onto, or associated with BC
determine the primary experimental readouts: visible pigmentation; image-based quantification of tone; spatial patterning under differential light exposure; material compatibility and stability
define the controls needed to evaluate whether the system is functioning as intended
identify the decision points that determine whether the project should proceed with:
direct microbial engineering only
synthetic minimal cells only or a
hybrid system
AIM 3: Evaluate secondary functional molecules only after establishing melanin as a robust first proof of concept
Keep melanin as the primary engineered output and assess other molecules only if they offer a clear, measurable improvement to the material. This aim is intended to prevent the project from becoming too diffuse too early and to ensure that any added complexity is justified by experimental value.
Practical objectives:
define which secondary properties would be worth pursuing only after melanin is validated, such as:
increased abrasion resistance
reduced permeability
improved mechanical robustness
antimicrobial activity
evaluate candidate molecules such as keratin or other structural/functional additives in terms of:
biological feasibility
compatibility with BC
expected measurable benefit
added engineering complexity
establish criteria for whether a second molecule is worth integrating into the platform by prioritizing only additions that significantly improve the material’s performance or expand its application in a clear and testable way.
Previous ideas
Historical register of the brainstorm for the Individual Project:
Later, I added 3 slides with an updated version of those 3 ideas in the appropriate slide deck for Committed Listeners here.
However, the current project direction is a different idea: a bacterial cellulose-based material platform for melanin-derived tonal output, potentially extended with synthetic minimal cells for compartmentalized light-responsive pigment generation.
But I decided to devolop another idea not present in the inicial registers.
Validation workflow for MelC2 pigment-production analysis. Generated with ChatGPT.
BioClub Committed Listener MoU
HTGAA Committed Listener (CL) Agreement
I am a HTGAA Committed Listener, my responsibilities are:
Watching class lectures and recitations
Participating in node reviews
Developing and documenting my homework
Actively communicating with other students and TAs on the forum
Allowing HTGAA and BioClub to share my work (with attribution)
Honestly reporting on my work, and appropriately attributing and citing the work of others (both human and non-human)
Following locally applicable health and safety guidance
Promoting a respectful environment free of harassment and discrimination
Signed by committing this file to my documentation page/repository,