Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    9th February 2026

  • Week 2 HW: DNA Read, Write, & Edit

    Part 1: Benchling & In-silico Gel Art Part 3: DNA Design Challenge 3.1. Choose your protein. I chose the PETase enzyme protein from the bacterium species Ideonella sakaiensis (strain 201-F6). I chose this protein as it was discovered to be important in plastic degradation. Its plastic degradation capabilities means that it allows bioremediation by reducing plastic pollution and promoting a circular economy.

  • Week 3 HW: Lab Automation

    Assignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME! A Blooming Daisy Flower PINK, PURPLE & BLUE DESIGN! :) INITIAL DESIGN: Python documentation from opentrons import types metadata = { 'author': 'Tammy Sisodiya', 'protocolName': ' HTGAA Dazzling Daisy', 'description': 'A blooming Daisy flower in Purple, Pink, and Blue.', 'source': 'HTGAA 2026 Opentrons Lab', 'apiLevel': '2.20' } ############################################################################## ### Robot deck setup constants ############################################################################## TIP_RACK_DECK_SLOT = 9 COLORS_DECK_SLOT = 6 AGAR_DECK_SLOT = 5 PIPETTE_STARTING_TIP_WELL = 'A1' # UPDATED: Mapping the new lab colors to source wells well_colors = { 'A1' : 'Purple', 'B1' : 'Pink', 'C1' : 'Blue' } def run(protocol): # Tips tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips') # Pipettes pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul]) # Modules temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT) temperature_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul', 'Cold Plate') color_plate = temperature_plate # Agar Plate agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate') center_location = agar_plate['A1'].top() pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL) # Helper Functions def location_of_color(color_string): for well,color in well_colors.items(): if color.lower() == color_string.lower(): return color_plate[well] raise ValueError(f"No well found with color {color_string}") def dispense_and_detach(pipette, volume, location): assert(isinstance(volume, (int, float))) above_location = location.move(types.Point(z=location.point.z + 5)) pipette.move_to(above_location) pipette.dispense(volume, location) pipette.move_to(above_location) ### YOUR DESIGN DATA ### sfgfp_points = [(-4.4, 26.4),(1.1, 26.4),(2.2, 26.4),(3.3, 26.4),(4.4, 26.4),(-6.6, 25.3),(-5.5, 25.3),(-4.4, 25.3),(-3.3, 25.3),(1.1, 25.3),(4.4, 25.3),(-12.1, 24.2),(-11, 24.2),(-9.9, 24.2),(-8.8, 24.2),(-7.7, 24.2),(-6.6, 24.2),(-2.2, 24.2),(0, 24.2),(4.4, 24.2),(-13.2, 23.1),(-12.1, 23.1),(-7.7, 23.1),(-6.6, 23.1),(-2.2, 23.1),(4.4, 23.1),(6.6, 23.1),(7.7, 23.1),(8.8, 23.1),(-13.2, 22),(-7.7, 22),(-6.6, 22),(-2.2, 22),(0, 22),(4.4, 22),(5.5, 22),(6.6, 22),(8.8, 22),(-13.2, 20.9),(-7.7, 20.9),(-2.2, 20.9),(3.3, 20.9),(4.4, 20.9),(9.9, 20.9),(-13.2, 19.8),(-7.7, 19.8),(-2.2, 19.8),(3.3, 19.8),(9.9, 19.8),(-13.2, 18.7),(-2.2, 18.7),(8.8, 18.7),(9.9, 18.7),(-13.2, 17.6),(-2.2, 17.6),(-1.1, 17.6),(8.8, 17.6),(-18.7, 16.5),(-17.6, 16.5),(-16.5, 16.5),(-15.4, 16.5),(-14.3, 16.5),(-12.1, 16.5),(-2.2, 16.5),(7.7, 16.5),(8.8, 16.5),(-20.9, 15.4),(-19.8, 15.4),(-18.7, 15.4),(-14.3, 15.4),(-13.2, 15.4),(-12.1, 15.4),(-2.2, 15.4),(-1.1, 15.4),(7.7, 15.4),(-20.9, 14.3),(-13.2, 14.3),(-12.1, 14.3),(-11, 14.3),(-3.3, 14.3),(-2.2, 14.3),(5.5, 14.3),(6.6, 14.3),(-20.9, 13.2),(-9.9, 13.2),(-8.8, 13.2),(-3.3, 13.2),(4.4, 13.2),(5.5, 13.2),(-20.9, 12.1),(-8.8, 12.1),(-7.7, 12.1),(-3.3, 12.1),(-2.2, 12.1),(2.2, 12.1),(3.3, 12.1),(5.5, 12.1),(6.6, 12.1),(7.7, 12.1),(8.8, 12.1),(9.9, 12.1),(11, 12.1),(-20.9, 11),(-19.8, 11),(-6.6, 11),(-5.5, 11),(-4.4, 11),(0, 11),(1.1, 11),(2.2, 11),(3.3, 11),(4.4, 11),(11, 11),(-19.8, 9.9),(-4.4, 9.9),(-3.3, 9.9),(-2.2, 9.9),(-1.1, 9.9),(0, 9.9),(11, 9.9),(-23.1, 8.8),(-22, 8.8),(-20.9, 8.8),(-19.8, 8.8),(-18.7, 8.8),(-17.6, 8.8),(-5.5, 8.8),(-4.4, 8.8),(-3.3, 8.8),(-2.2, 8.8),(-1.1, 8.8),(0, 8.8),(9.9, 8.8),(11, 8.8),(15.4, 8.8),(16.5, 8.8),(17.6, 8.8),(18.7, 8.8),(19.8, 8.8),(20.9, 8.8),(22, 8.8),(23.1, 8.8),(24.2, 8.8),(-23.1, 7.7),(-3.3, 7.7),(-1.1, 7.7),(0, 7.7),(1.1, 7.7),(2.2, 7.7),(8.8, 7.7),(9.9, 7.7),(14.3, 7.7),(15.4, 7.7),(16.5, 7.7),(20.9, 7.7),(22, 7.7),(24.2, 7.7),(-24.2, 6.6),(-23.1, 6.6),(-5.5, 6.6),(-4.4, 6.6),(-3.3, 6.6),(-2.2, 6.6),(1.1, 6.6),(2.2, 6.6),(7.7, 6.6),(8.8, 6.6),(9.9, 6.6),(11, 6.6),(12.1, 6.6),(13.2, 6.6),(14.3, 6.6),(16.5, 6.6),(19.8, 6.6),(20.9, 6.6),(23.1, 6.6),(-22, 5.5),(-9.9, 5.5),(-7.7, 5.5),(-6.6, 5.5),(-5.5, 5.5),(-4.4, 5.5),(-2.2, 5.5),(3.3, 5.5),(4.4, 5.5),(13.2, 5.5),(16.5, 5.5),(18.7, 5.5),(19.8, 5.5),(23.1, 5.5),(-20.9, 4.4),(-19.8, 4.4),(-18.7, 4.4),(-17.6, 4.4),(-16.5, 4.4),(-15.4, 4.4),(-14.3, 4.4),(-13.2, 4.4),(-12.1, 4.4),(-9.9, 4.4),(-8.8, 4.4),(-7.7, 4.4),(-2.2, 4.4),(4.4, 4.4),(5.5, 4.4),(6.6, 4.4),(13.2, 4.4),(16.5, 4.4),(17.6, 4.4),(18.7, 4.4),(23.1, 4.4),(-12.1, 3.3),(-11, 3.3),(-2.2, 3.3),(5.5, 3.3),(7.7, 3.3),(8.8, 3.3),(9.9, 3.3),(11, 3.3),(12.1, 3.3),(13.2, 3.3),(16.5, 3.3),(17.6, 3.3),(23.1, 3.3),(-13.2, 2.2),(-12.1, 2.2),(-2.2, 2.2),(6.6, 2.2),(9.9, 2.2),(16.5, 2.2),(17.6, 2.2),(18.7, 2.2),(19.8, 2.2),(20.9, 2.2),(22, 2.2),(-14.3, 1.1),(-2.2, 1.1),(7.7, 1.1),(9.9, 1.1),(11, 1.1),(15.4, 1.1),(16.5, 1.1),(22, 1.1),(-15.4, 0),(-14.3, 0),(-2.2, 0),(7.7, 0),(11, 0),(14.3, 0),(15.4, 0),(22, 0),(-15.4, -1.1),(-2.2, -1.1),(3.3, -1.1),(8.8, -1.1),(13.2, -1.1),(14.3, -1.1),(20.9, -1.1),(-15.4, -2.2),(-14.3, -2.2),(-13.2, -2.2),(-12.1, -2.2),(-8.8, -2.2),(-7.7, -2.2),(-6.6, -2.2),(-2.2, -2.2),(3.3, -2.2),(8.8, -2.2),(11, -2.2),(12.1, -2.2),(13.2, -2.2),(20.9, -2.2),(-14.3, -3.3),(-11, -3.3),(-8.8, -3.3),(-7.7, -3.3),(-6.6, -3.3),(-3.3, -3.3),(-2.2, -3.3),(3.3, -3.3),(4.4, -3.3),(8.8, -3.3),(11, -3.3),(19.8, -3.3),(20.9, -3.3),(-13.2, -4.4),(-12.1, -4.4),(-11, -4.4),(-8.8, -4.4),(-3.3, -4.4),(-2.2, -4.4),(4.4, -4.4),(5.5, -4.4),(6.6, -4.4),(7.7, -4.4),(8.8, -4.4),(13.2, -4.4),(19.8, -4.4),(-16.5, -5.5),(-15.4, -5.5),(-14.3, -5.5),(-13.2, -5.5),(-8.8, -5.5),(-4.4, -5.5),(-3.3, -5.5),(-2.2, -5.5),(4.4, -5.5),(14.3, -5.5),(15.4, -5.5),(16.5, -5.5),(18.7, -5.5),(19.8, -5.5),(-19.8, -6.6),(-18.7, -6.6),(-17.6, -6.6),(-16.5, -6.6),(-8.8, -6.6),(-4.4, -6.6),(-1.1, -6.6),(3.3, -6.6),(4.4, -6.6),(17.6, -6.6),(18.7, -6.6),(-23.1, -7.7),(-22, -7.7),(-20.9, -7.7),(-19.8, -7.7),(-17.6, -7.7),(-16.5, -7.7),(-15.4, -7.7),(-8.8, -7.7),(-7.7, -7.7),(-6.6, -7.7),(-5.5, -7.7),(-4.4, -7.7),(-3.3, -7.7),(-2.2, -7.7),(-1.1, -7.7),(0, -7.7),(2.2, -7.7),(3.3, -7.7),(16.5, -7.7),(17.6, -7.7),(-24.2, -8.8),(-23.1, -8.8),(-14.3, -8.8),(-13.2, -8.8),(-8.8, -8.8),(-7.7, -8.8),(-3.3, -8.8),(-2.2, -8.8),(0, -8.8),(1.1, -8.8),(2.2, -8.8),(3.3, -8.8),(5.5, -8.8),(14.3, -8.8),(15.4, -8.8),(16.5, -8.8),(-26.4, -9.9),(-25.3, -9.9),(-24.2, -9.9),(-12.1, -9.9),(-11, -9.9),(-9.9, -9.9),(-8.8, -9.9),(-3.3, -9.9),(0, -9.9),(7.7, -9.9),(8.8, -9.9),(11, -9.9),(12.1, -9.9),(13.2, -9.9),(-27.5, -11),(-26.4, -11),(-25.3, -11),(-24.2, -11),(-23.1, -11),(-22, -11),(-20.9, -11),(-19.8, -11),(-18.7, -11),(-17.6, -11),(-16.5, -11),(-15.4, -11),(-14.3, -11),(-13.2, -11),(-12.1, -11),(-11, -11),(-3.3, -11),(0, -11),(-28.6, -12.1),(-27.5, -12.1),(-19.8, -12.1),(-18.7, -12.1),(-17.6, -12.1),(-15.4, -12.1),(-12.1, -12.1),(-4.4, -12.1),(-3.3, -12.1),(-2.2, -12.1),(0, -12.1),(-28.6, -13.2),(-27.5, -13.2),(-20.9, -13.2),(-19.8, -13.2),(-12.1, -13.2),(-4.4, -13.2),(0, -13.2),(-26.4, -14.3),(-25.3, -14.3),(-13.2, -14.3),(-12.1, -14.3),(-5.5, -14.3),(-2.2, -14.3),(0, -14.3),(-23.1, -15.4),(-20.9, -15.4),(-19.8, -15.4),(-13.2, -15.4),(-8.8, -15.4),(-7.7, -15.4),(-6.6, -15.4),(-1.1, -15.4),(0, -15.4),(-18.7, -16.5),(-16.5, -16.5),(-15.4, -16.5),(-14.3, -16.5),(-13.2, -16.5),(-12.1, -16.5),(-9.9, -16.5),(-8.8, -16.5),(-2.2, -16.5),(0, -16.5),(-2.2, -17.6),(0, -17.6),(0, -18.7),(-1.1, -19.8),(1.1, -19.8),(-1.1, -20.9),(1.1, -20.9),(2.2, -20.9),(0, -22),(3.3, -22),(12.1, -22),(13.2, -22),(14.3, -22),(15.4, -22),(0, -23.1),(4.4, -23.1),(5.5, -23.1),(9.9, -23.1),(11, -23.1),(12.1, -23.1),(13.2, -23.1),(1.1, -24.2),(2.2, -24.2),(5.5, -24.2),(6.6, -24.2),(7.7, -24.2),(8.8, -24.2),(9.9, -24.2),(11, -24.2),(12.1, -24.2),(2.2, -25.3),(3.3, -25.3),(4.4, -25.3),(9.9, -25.3),(11, -25.3),(5.5, -26.4),(6.6, -26.4),(7.7, -26.4),(8.8, -26.4)] mrfp1_points = [(-15.4, 12.1),(-14.3, 12.1),(-14.3, 11),(-13.2, 11),(-12.1, 11)] mscarlet_i_points = [(-11, 20.9),(-9.9, 20.9),(-11, 19.8),(-9.9, 19.8),(-9.9, 18.7)] mko2_points = [(3.3, 18.7),(4.4, 18.7),(5.5, 18.7),(6.6, 18.7),(4.4, 17.6)] mjuniper_points = [(6.6, 9.9),(7.7, 9.9),(4.4, 8.8),(5.5, 8.8),(6.6, 8.8),(7.7, 8.8),(-6.6, 2.2),(-9.9, 1.1),(-8.8, 1.1),(-7.7, 1.1),(-6.6, 1.1)] electra2_points = [(1.1, 4.4),(1.1, 3.3),(1.1, 2.2),(1.1, 1.1)] # 2. UPDATED Design Mapping # Purple for the large petals, Pink for highlights, Blue for details. layers = [ ('Purple', sfgfp_points), ('Pink', mrfp1_points), ('Pink', mscarlet_i_points), ('Pink', mko2_points), ('Blue', mjuniper_points), ('Blue', electra2_points) ] # 3. Execution Loop drop_vol = 1.0 for color_name, points in layers: if not points: continue source_well = location_of_color(color_name) for i in range(0, len(points), 15): chunk = points[i:i + 15] pipette_20ul.pick_up_tip() aspirate_vol = (len(chunk) * drop_vol) + 2.0 if aspirate_vol > 20.0: aspirate_vol = 20.0 pipette_20ul.aspirate(aspirate_vol, source_well) for x, y in chunk: if (x2 + y2) < 1600: target_point = center_location.point + types.Point(x=x, y=y, z=0) target_loc = types.Location(target_point, None) dispense_and_detach(pipette_20ul, drop_vol, target_loc) # Return residual to source well top to avoid contamination if pipette_20ul.current_volume > 0: pipette_20ul.dispense(pipette_20ul.current_volume, source_well.top()) pipette_20ul.drop_tip() Post-Lab Questions — DUE BY START OF FEB 24 LECTURE

  • Week 4 HW: Protein Design Part I

    Part A. Conceptual Questions How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) 6.022 x10^23 molecules of amino acids. Why do humans eat beef but do not become a cow, eat fish but do not become fish? Humans obtain amino acids from the food they eat, it gets broken down and digested by the human, incorporated into human tissues and used for energy consuming, and leftover parts are excreted as normal.

  • Week 5 HW: Protein Design Part II

    Part 1: Generate Binders with PepMLM Changed 5th character from A to V. >sp|P00441|SOD1_HUMAN_A4V Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2 MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ 2. Sequence 5 TKGNAGSRLACG WRGDDSVDFEGR 18.929941 Part 2: Evaluate Binders with AlphaFold3 Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried? Protein-peptide sequence 1: SSTLRLFAQLRR, ipTM = 0.55, pTM = 0.67 The low ipTM of 0.3 suggests the peptide is surface-bound, and is roughly associated with the outer beta barrel part. It is not buried and is situated on the exterior surface of the Alphafold model, and does not situate near the N-terminus where the A4V mutation sits.

  • Week 6 HW: Genetic Circuits Part I: Assembly Technologies

    Q1 What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion DNA Polymerase which synthesises new DNA strands by adding nucleotides using the template during replication. It ensures high precision copying, editing and proofreading. Deoxynucleotides - dNTP consists of a deoxyribose sugar, a nitrogenous base (A, T, C, or G), and three phosphate groups. The DNA polymerase adds them to a growing DNA chain to form the new complementary strand. Reaction Buffer - creates a suitable chemical environment for the Phusion High-Fidelity DNA Polymerase for DNA synthesis. Regulates a stable pH, provides magnesium ions as a cofactor for catalytic activity, and contains important components to increase specificity, yield and high-fidelity performance. Magnesium Chloride - DNA polymerase which synthesises DNA, requires Mg2+ for DNA synthesis, as it is required for enzyme Catalytic Activity, in the active site to catalyze the formation of phosphodiester bonds between the 3′-OH of a primer and the phosphate group of a nucleotide. DMSO to increase G/C rich targets - additive which boosts denaturation of difficult or GC-rich targets. Q2 What are some factors that determine primer annealing temperature during PCR?

  • Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits

    Q1) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions? IANNs can be rearranged, remodelled, or changed into a new form, layout, or function after their initial creation in comparison to boolean genetic circuits which are rigid/fixed, and its reconfiguring ability means it can be used for analytical devices integrating biological recognition elements (enzymes, microorganisms, antibodies) with transducers to detect bioavailable pollutants (microplastics, heavy metals) to check biotoxicity Less metabolic stress in cells with IANNs compared to boolean genetic circuits IANNs can do Complex pattern recognition and analog computation - IANNs can look at many variables like temperature, Wind Direction, Traffic Speed and Humidity and the logic derived from this is relationships and biological signatures for example, it is context aware and gives a probability based result (risk in percentage or category). This is the opposite for traditional genetic circuits, the same processing step is a conditional if/then statement, as Boolean circuits are limited to 2^n states. For example, if a city sensor receives an input that is “partly true” (e.g., moderate congestion but high humidity), a Boolean circuit may fail to trigger / provide an incorrect binary output.

  • Week 9 HW: Cell-free Systems

    Homework Part A: General and Lecturer-Specific Questions General homework questions [1] Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell free expression is more beneficial than cell production. If we wanted to produce a toxic protein / protein deposits which are misfolded, non-functional, and often highly stable multi-molecular structures - restriction enzymes which cut and edit DNA, cytotoxic proteins (e.g. immune cell related molecules such as perforins which puncture cell membranes and granzymes which trigger cell apoptosis, both produced by NK cells), the cell free expression allows production without affecting cell viability in comparison to in vivo methods. Due to the barrier of cell membranes being removed in cell-free systems, labeled or unnatural amino acids can be put in the mix (of organelles like ribosomes) for targeted, specialised protein synthesis: for example in in vivo labeling for NMR or fluorescence studies. In in vivo methods, the cell system’s energy and resources can be divided between different organelles, whereas in CFS the energy and resources can solely be focused on making the target protein. You can do real time tracking of protein synthesis using spectrophotometry or other analytical methods in CFS due to the large, uncomplicated system comparatively to in vivo methods where the cell has many organelles. Conditions of the reaction can be controlled; variables such as temperature, redox potential and pH can be changed and finetuned without worrying about causing host death. Cloning and transformation steps are skipped, as we can use linear DNA or circular plasmid DNA directly. Two cases where CFS is beneficial over in vivo methods;

  • Week 10 HW: Advanced Imaging & Measurement Technology

    Homework: Final Project For your final project: Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc. I will use methods to check how effective a protein’s physical and chemical properties (stability, folding, binding) is of my redesigned receptor-binding domains across multiple key metrics: (1) I will measure binding affinity: the thermodynamic strength of the binding between my redesigned T7 tail fiber and the target P. aeruginosa surface receptors (e.g., OprF, PilA). (2) I will measure structural stability using the Root Mean Square Deviation (RMSD) of the polypeptide chain over time to ensure the redesign hasn’t introduced metastability and failure to adopt their native, functional conformation. (3) I will measure solubility and aggregation propensity - this will measure the likelihood of the protein remains soluble versus forming insoluble inclusion bodies during recombinant expression. (4) To ensure functional efficiency and biosafety/biosecurity compliance, I will measure Codon Adaptation Index (CAI) to ensure the elimination of regulated DNA sequences from restricted, highly pathogenic agents (e.g. Ebola, SARS-CoV-2, Anthrax).

  • Week 11 HW: Bioproduction & Cloud Labs

    Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork (1) I didn’t manage to contribute to the artwork. (2) I liked the fact that every pixel removed / or placed directly influences a cell-free protein synthesis optimisation experiment, it makes the art feel alive and purposeful. A couple of suggestions and ideas I thought, instead of removing pixels to end the experiment, perhaps next year could feature a growth versus decay mechanic where different biological inputs (represented by different colours) compete for dominance on the plate. It would be interesting to have a secondary window showing a live feed or a time-lapse of the actual laboratory plate being manipulated by the cloud lab robots as we click, although the slider showing the bioart over time is incredible. To prevent griefing or to encourage rapid collaboration during peak hours, the cooldown could scale based on the complexity of the protein being synthesised in that specific quadrant.

  • Week 12 HW: Building Genomes

    Post Lab Questions | Mandatory for All Students (1) Which genes when transferred into E. coli will induce the production of lycopene and beta-carotene, respectively? While E. coli naturally possesses the MEP pathway to produce the precursors IPP and DMAPP, it lacks the downstream enzymes required to synthesize lycopene. To enable lycopene production, the following three genes are required: crtE: Encodes geranylgeranyl pyrophosphate (GGPP) synthase, crtB: Encodes phytoene synthase, crtI: Encodes phytoene desaturase

Subsections of Homework

Week 1 HW: Principles and Practices

9th February 2026

Tammy Sisodiya

Figure 1: Diagram of how the biocartridge works

QUESTION 2

I chose ensuring environmental and public health safety (Non-Maleficence) as the governance / policy goal. I defined some subgoals.

Subgoals:

  1. Microbial competition for natural resources like food, space and nutrients due to invasive bacteria entering water bodies, and due to selective, evolutionary advantages such as faster growth and being able to use nutrients efficiently compared to native microbial populations .

Policy requirement: We can use multi-layered biocontainment for bio-cartridges. For example, we can engineer the bacteria to metabolically require a nutrient for their growth and development which is not found in natural water bodies, therefore it will not outgrow and compete with native microbes. We can also use a environmental condition resistant, hydrogel based system to encapsulate the bacteria, ensuring it cannot escape into the water bodies.

Specific metric to assess the policy: If there is no detergent or clothes washing chemicals used, it should automatically trigger the genetic kill-switch which conditionally over expresses the toxic essential gene in toxin-antitoxin systems or activates selective nutrient dependency.

Traceability: We must use DNA barcoding for bacterial species identification. The bacterial strain will have a inoperative genetic sequence which allows regulators to trace bacteria to its manufacturer.

  1. Horizontal gene transfer occurrence - plastic recycling enzyme and antibiotic resistance genes transferring into pathogenic bacteria. The plastic degrading enzymes may, if released, also affect objects composed of PET of importance, such as in dashboards, door panels, engine covers, ignition components, gear housings and connector housings in the automotive industry and in pill bottles, for example, in the pharmaceutical industry.

Policy requirement: We can ensure bacteria do not contain antibiotic resistance genes in their genome which can be evolutionarily selected for horizontal gene transfer.

Specific metric to assess the policy: We can use gene editing technologies like CRISPR-Cas9 to ensure no antibiotic resistance genes remain to be transferred to pathogenic microbes. Policies need to elucidate that PETase’s plastic breakdown ability doesn’t create harmful chemicals which are more toxic than the microplastic.

When I asked Gemini, “What should I measure for governance/policy goals related to ensuring that this application or tool contributes to an ‘ethical’ future for a bacteria coculture based project?” the AI suggested Safety and Biocontainment Metrics, Environmental Justice & Sustainability & Equity and Global Governance (Google, 2026).

3) QUESTION 3

Proposed Governance ActionAspectDescription & Implementation
Regulating and approving the genetic kill and selective nutrient dependency mechanisms in bacteria
(e.g., Synthetic Auxotrophy)

Actor: Federal regulators
PurposeIf there is damage or destruction to the bio cartridge causing escape of the microbes, the microbial survival rate can be set at <10−8 (Cell Biology by the Numbers, 2015) to ensure that if an organism escapes, it is highly likely to not survive almost immediately, reducing the biosafety threat to ecosystems.
DesignFederal regulators will need to approve the: genetic kill switch mechanism with toxin-antitoxin systems to kill bacteria and the gene editing to make engineered PETase plastic degrading bacteria with a metabolic dependency mechanism by selective nutrient dependency.
AssumptionsWe know the microbial composition of our existing water bodies and the nutrient dependencies they have compared to our engineered microbe.
Risks of Failure & “Success”Risk: Mutations may occur to avoid selective nutrient dependency gene editing. Success: Manufacturers can identify bacteria they’ve edited compared to natural microbial populations and can understand and control their growth.
Providing an incentive for the sustainable upcycling of bio cartridges


Actor: Private companies and device manufacturing companies (Miele, Bosch, Hotpoint)
PurposeCompanies at present that produce filters do not provide incentives for users combined with promoting sustainability measures. Users will receive money off their energy bill or new cartridges when they return old cartridges for commercial sterilisation and nutrient reuse.
DesignCompanies will finance the cartridge returning program, from the user’s home to the bio cartridge manufacturing / sterilising and nutrient reuse depot.
AssumptionsWe can assume users will keep the cartridge until they receive confirmation to return it back instead of disposing in the bin.
Risks of Failure & “Success”Risk: High fossil fuel and carbon dioxide emission production from transporting cartridges would exceed the environmental advantageousness of the textile microplastics degradation and byproduct upcycling. Success: Promotes a Circular economy.
Providing the no marker policy for bacteria


Actor: Academic researchers
PurposeResearchers will use gene editing to make a unique short genetic sequence in the engineered bacteria genome to identify it when it escapes by researchers and federal regulators, and we can trace back to the institution where it was produced.
DesignWe will create a database to add all of the engineered bacteria genetic sequences so they can be differentiated and identified as escaped microbes.
Assumptions & Risks of Failure & “Success”Assumption: We can assume the unique sequence in the bacterial genome will not mutate over time. Risks of Failure & “Success”: Failure: Selective pressures to remove DNA which does not serve a function in their genome. Success: Chances are the no marker trait may be selectively picked.

QUESTION 4

5) PRIORITIES, TRADE-OFFS, ASSUMPTIONS AND ETHICAL CONCERNS

1 - Our audiences - The audiences are the United Nations Environment Programme (UNEP) Global Plastics Treaty and Industry Consortia. I prioritise regulation and extended producer responsibility (EPR) (OECD, 2001) as a combination of options. Why? Companies do not take liability for the ocean microplastics pollution problem, as it is not owned by anyone solely (regulation). Therefore, EPR gives the incentive to use products like PHA to make other important products like sustainable packaging and agricultural films.

Bio Cartridges once filled can be sent back and with the circular economy approach, the PHA collected can be used for other purposes such as sustainable packaging, agricultural films, medical devices and consumer goods. Therefore, EPR can be used by manufacturers, as if the consumer’s cartridge is full they can send it back for free and can receive a discount for a new bio cartridge. A loyalty program will ensure with 3 uses, that they receive a cartridge for free.

2 - Trade Offs include the cost of buying and receiving new cartridges by transport, transporting old cartridges for cartridge cleaning and reuse and sending back for households with less income.

The bio cartridge if attached to the machine will make the price more expensive. We can mitigate this if we make the bio cartridge addition optional but a recommended action for reducing microplastic pollution. The government may also support with subsidies.

Another tradeoff is the biosecurity risk associated with bacteria like B. subtilis and P. putida being used in the home. We mitigate this by using the immobilisation technique of anchoring the bacteria to a solid cellulose membrane, which increases biosafety and decreases risk of contamination in water bodies in the event the cartridge is damaged/broken.

3 - Assumptions and Uncertainties

Assumptions we make is:

Our PETase enzyme can survive the high alkalinity of detergents in the washing machine, even when optimised in silico. We must also assume users will return bio cartridges back and will not dispose in the bin, as the circular economy approach using P. putidain layer 2 will be missed. The cost of harvesting PHA will far outweigh the cost of cartridge returning, sterilisation and nutrient recycling processes and sending back to the consumer. But we do not know if P. putida will overcome the upcycling ability by mutating over multiple generations.

4 - Ethical concerns we may have

Possibility of microbe leakage in natural water bodies, biocontainment hazard.

Is all microplastic pollution the individual’s responsibility? The cost of using the biocartridge may be a disadvantage to lower income households if not used as an addition and shifts all liability from manufacturers to individuals. Laundry must be affordable for all.

What are the proposed governance actions for these ethical concerns?

  • We can use genetic kill switches to kill bacteria if grown on a selective nutrient to ensure metabolic dependency on it.
  • Cartridges are funded by manufacturers to be sent back by consumers and cleaned thoroughly to send back, so the cost is not upon the consumer
  • Biosafety regulators will ensure all bio cartridges are certified to make sure they are tested for breaking/damage and the non functional DNA watermark is added in the bacterial genome and was added to the DNA barcoding database (is all open source).

Week 2 Lecture Prep

Homework Questions from Professor Jacobson:

  1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy? The error rate of DNA polymerase is between 10-4 to 10-6. The human genome is composed of 3 billion base pairs, and every cell division ends up with 3,000–6,000 mistakes. Biology deals with the discrepancy by allowing 3′→5′ exonuclease activity, which proofreads the DNA for errors and takes off wrong bases straight away. After DNA is replicated, mismatch repair occurs, the freshly replicated DNA is checked for mismatches which have occurred, and will removes the incorrect base, to replace it with the right one.

  2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest? Total variations which are possible with 4 base pairs and 1036 human base pairs = 4^1036.

So log10 (41036) = 1036 * log10 (4) 1036 * 0.60206 = 623.734. This results in: approx 5.42 * 10623 possible DNA sequences. Some codons encode for stop codons which terminate the sequence, there may be a hairpin structure which blocks polymerases and the ribosome cannot bind and make any protein. Codon choice may be important for an organism (bacteria and humans use different codons), as if a sequence uses codons for which the cell has few transfer RNAs, the ribosome will stop and produce no protein.

Homework Questions from Dr. LeProust:

  1. What’s the most commonly used method for oligo synthesis currently? The most common method used is solid phase phosphoramidite chemistry.

  2. Why is it difficult to make oligos longer than 200nt via direct synthesis? The quality and the amount of product obtained from the chemical synthesis is poor as the sequence length exceeds longer than 200nt.

  3. Why can’t you make a 2000bp gene via direct oligo synthesis? 2000bp gene is not possible to create via oligo synthesis as it will be error prone, amount obtained will be less and the quality will be reduced. Sequences should be between <150–200 bases.

Homework Question from Prof. Church:

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”? Histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan and valine. Mammals cannot directly produce lysine in their bodies and rely on getting lysine from consuming foods such as beef, pork, chicken, fish (cod, sardines), dairy and soy products which contain it, as opposed to what was insinuated in the Lysine Contingency where dinosaurs are genetically modified with the inability to produce lysine (Lopez & Mohiuddin, 2024).

References

Bao, T., Qian, Y., Xin, Y., Collins, J. J., & Lu, T. (2023). Engineering microbial division of labor for plastic upcycling. Nature Communications, 14(1), 5712. https://doi.org/10.1038/s41467-023-40777-x

Cell biology by the numbers. (2015). Garland Science. https://doi.org/10.1201/9780429258770

Federley, R. G., & Romano, L. J. (2010). DNA polymerase: Structural Homology, Conformational Dynamics, and the Effects of Carcinogenic DNA Adducts. Journal of Nucleic Acids, 2010, 457176. https://doi.org/10.4061/2010/457176

Google. (2026). Gemini (Feb 10 version) [Large language model]. https://gemini.google.com

“Kill switch” design strategies for genetically modified organisms | Physical and Life Sciences Directorate. (n.d.). Retrieved 10 February 2026, from https://pls.llnl.gov/article/26016/kill-switch-design-strategies-genetically-modified-organisms

Lopez, M. J., & Mohiuddin, S. S. (2025). Biochemistry, essential amino acids. In StatPearls. StatPearls Publishing. http://www.ncbi.nlm.nih.gov/books/NBK557845/

Microfibres: The plastic in our clothes | Friends of the Earth. (n.d.). Retrieved 10 February 2026, from https://friendsoftheearth.uk/plastics/microfibres-plastic-in-our-clothes

Microfiber filter. (n.d.). PlanetCare. Retrieved 10 February 2026, from https://planetcare.org/pages/microfiber-filter-washing-machine

OECD. (2001). Extended producer responsibility: A guidance manual for governments. OECD. https://doi.org/10.1787/9789264189867-en

Planetcare | the most effective solution to stop microfiber pollution. (n.d.). PlanetCare. Retrieved 10 February 2026, from https://planetcare.org/

Team:Exeter/Hardware—2019.igem.org. (n.d.). Retrieved 10 February 2026, from https://2019.igem.org/Team:Exeter/Hardware

Vassilenko, K., Watkins, M., Chastain, S., Posacka, A., & Ross, P. S. (2019). Me, my clothes and the ocean: The role of textiles in microfiber pollution (Ocean Wise Science Feature). Ocean Wise Conservation Association. https://assets.ctfassets.net/fsquhe7zbn68/4MQ9y89yx4KeyHv9Svynyq/8434de64585e9d2cfbcd3c46627c7a4a/Research_MicrofibersReport_191004-e.pdf

World Health Organization. (2022). Global guidance framework for the responsible use of the life sciences: Mitigating biorisks and governing dual-use research. https://www.who.int/publications/i/item/9789240056107

DNA Animation from LottieFiles: https://lottiefiles.com/free-animation/genetics-iHhxPhbgLp

Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

virtual_digest_sequence_LAMCG (3).png virtual_digest_sequence_LAMCG (3).png

Part 3: DNA Design Challenge

3.1. Choose your protein.

I chose the PETase enzyme protein from the bacterium species Ideonella sakaiensis (strain 201-F6). I chose this protein as it was discovered to be important in plastic degradation. Its plastic degradation capabilities means that it allows bioremediation by reducing plastic pollution and promoting a circular economy.

FASTA sequence of PETase:

sp|A0A0K8P6T7|PETH_PISS1 Poly(ethylene terephthalate) hydrolase OS=Piscinibacter sakaiensis OX=1547922 GN=ISF6_4831 PE=1 SV=1 MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRP SGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQP SSRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAA PQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSRNAKQFLEINGGSHSCA NSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCS

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

Reverse translation of PETase:

Reverse Translate Results for 290 residue sequence “Untitled” starting “MNFPRASRLM”

reverse translation of Untitled to a 870 base sequence of most likely codons. atgaactttccgcgcgcgagccgcctgatgcaggcggcggtgctgggcggcctgatggcg gtgagcgcggcggcgaccgcgcagaccaacccgtatgcgcgcggcccgaacccgaccgcg gcgagcctggaagcgagcgcgggcccgtttaccgtgcgcagctttaccgtgagccgcccg agcggctatggcgcgggcaccgtgtattatccgaccaacgcgggcggcaccgtgggcgcg attgcgattgtgccgggctataccgcgcgccagagcagcattaaatggtggggcccgcgc ctggcgagccatggctttgtggtgattaccattgataccaacagcaccctggatcagccg agcagccgcagcagccagcagatggcggcgctgcgccaggtggcgagcctgaacggcacc agcagcagcccgatttatggcaaagtggataccgcgcgcatgggcgtgatgggctggagc atgggcggcggcggcagcctgattagcgcggcgaacaacccgagcctgaaagcggcggcg ccgcaggcgccgtgggatagcagcaccaactttagcagcgtgaccgtgccgaccctgatt tttgcgtgcgaaaacgatagcattgcgccggtgaacagcagcgcgctgccgatttatgat agcatgagccgcaacgcgaaacagtttctggaaattaacggcggcagccatagctgcgcg aacagcggcaacagcaaccaggcgctgattggcaaaaaaggcgtggcgtggatgaaacgc tttatggataacgatacccgctatagcacctttgcgtgcgaaaacccgaacagcacccgc gtgagcgattttcgcaccgcgaactgcagc

3.3 Codon optimization

Codon optimization is important for speed and efficiency in producing the maximum amount of protein from the cell. Rare codons will not have sufficient tRNA anticodons to match causing the organelle ribosome, responsible for protein synthesis to stall.

If the ribosome stalls, this can lead to the formation of structurally abnormal proteins, losing their proper three dimensional shape, leading to dysfunction in protein activity.

I chose E-coli as its genome is well documented in literature, it grows very fast, and is very frequently used as a host for recombinant DNA technology. Moreover, DNA instructions can be changed to match the bacterial machinery of the E. coli bacteria, allowing researchers and scientists to produce PETase quicker and in higher amounts.

Codon optimised sequence for PETase:

ATGAACTTTCCACGTGCCTCCCGTCTGATGCAGGCAGCTGTGCTGGGTGGCCTGATGGCG
GTTAGCGCCGCAGCAACTGCTCAGACCAATCCGTACGCGCGTGGCCCGAACCCGACTGCC
GCGAGCCTGGAGGCCAGCGCAGGTCCGTTCACCGTACGCAGCTTTACCGTGAGCCGTCCG
AGCGGCTACGGCGCAGGTACCGTGTATTACCCGACCAACGCAGGCGGTACCGTAGGCGCA
ATTGCGATTGTGCCGGGCTATACCGCACGCCAGAGCTCCATCAAGTGGTGGGGCCCGCGT
CTGGCCAGCCACGGTTTCGTAGTGATCACCATCGATACCAACAGCACCCTGGATCAGCCG
AGCTCCCGTAGCTCCCAGCAGATGGCAGCACTGCGTCAGGTGGCATCCCTGAACGGTACC
AGCTCCAGCCCGATCTACGGCAAGGTAGATACCGCACGTATGGGCGTGATGGGTTGGAGC
ATGGGCGGTGGCGGCAGCCTGATCTCCGCAGCAAACAACCCGAGCCTGAAAGCAGCAGCA
CCGCAGGCACCGTGGGATAGCTCCACCAACTTCAGCTCCGTGACCGTGCCGACCCTGATC
TTCGCATGCGAAAACGATAGCATCGCACCGGTGAACAGCTCCGCACTGCCGATCTACGAT
AGCATGAGCCGTAACGCGAAACAGTTCCTGGAAATCAACGGTGGCTCCCACAGCTGCGCC
AACAGCGGCAACAGCAACCAGGCATTGATCGGCAAGAAAGGTGTGGCCTGGATGAAACGT
TTCATGGATAACGATACCCGCTACTCCACCTTCGCATGCGAAAACCCGAACAGCACCCGT
GTGAGCGATTTCCGCACCGCAAACTGCAGC

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Firstly, I would use a cell dependent protein production method like recombinant DNA technology, which would involve inserting optimised DNA into a plasmid using recombinases and ligases to cut and join DNA, the plasmid would be transformed into the E-coli culture, the bacteria will be grown and then we can use IPTG to induce recombinant protein expression in the cells. The cells will be lysed and we will isolate and purify the PETase protein (Schütz et al., 2023).

Another technology I could use is a non cell based protein production method which would involve growing cells in a high concentration, bursting them open using high pressure and then using centrifugation to isolate the protein synthesis machinery such as ribosomes and enzymes. Finally, we can add the optimized DNA to the isolated organelles supernatant, which results in PETase being produced.

Part 4: Prepare a Twist DNA Synthesis Order

https://benchling.com/s/seq-oY3lJJH8RajCqnHCK4RG?m=slm-GAs1mnavmWg4pBOHzVnp

.fasta file for TA review :)

PETase_2 TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGAACTTTCCGCGCGCGAGCC GCCTGATGCAGGCGGCGGTGCTGGGCGGCCTGATGGCGGTGAGCGCGGCGGCGACCGCGCAGACCAACCCGTATGCGCG CGGCCCGAACCCGACCGCGGCGAGCCTGGAAGCGAGCGCGGGCCCGTTTACCGTGCGCAGCTTTACCGTGAGCCGCCCG AGCGGCTATGGCGCGGGCACCGTGTATTATCCGACCAACGCGGGCGGCACCGTGGGCGCGATTGCGATTGTGCCGGGCT ATACCGCGCGCCAGAGCAGCATTAAATGGTGGGGCCCGCGCCTGGCGAGCCATGGCTTTGTGGTGATTACCATTGATAC CAACAGCACCCTGGATCAGCCGAGCAGCCGCAGCAGCCAGCAGATGGCGGCGCTGCGCCAGGTGGCGAGCCTGAACGGC ACCAGCAGCAGCCCGATTTATGGCAAAGTGGATACCGCGCGCATGGGCGTGATGGGCTGGAGCATGGGCGGCGGCGGCA GCCTGATTAGCGCGGCGAACAACCCGAGCCTGAAAGCGGCGGCGCCGCAGGCGCCGTGGGATAGCAGCACCAACTTTAG CAGCGTGACCGTGCCGACCCTGATTTTTGCGTGCGAAAACGATAGCATTGCGCCGGTGAACAGCAGCGCGCTGCCGATT TATGATAGCATGAGCCGCAACGCGAAACAGTTTCTGGAAATTAACGGCGGCAGCCATAGCTGCGCGAACAGCGGCAACA GCAACCAGGCGCTGATTGGCAAAAAAGGCGTGGCGTGGATGAAACGCTTTATGGATAACGATACCCGCTATAGCACCTT TGCGTGCGAAAACCCGAACAGCACCCGCGTGAGCGATTTTCGCACCGCGAACTGCAGCCATCACCATCACCATCATCAC TAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGC TCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATAATA

4.6. Choose Your Vector

https://benchling.com/s/seq-qZFBnXDT1X5Wyl0Mw1YJ?m=slm-FoxAu2Oeo4nJoQJoTZMP

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I would want to sequence the primary surface protein gene (the spike protein which determines immunogenicity) of common virus, such as a localized variant of norovirus.

Why I would sequence it:

If I sequence the major capsid protein, VP1, which is encoded by the Open Reading Frame 2 (ORF2) gene from environmental samples (such as water bodies), we can find the virus in a city before the virus transmits. This acts as an early warning system.

Viruses mutate quickly therefore sequencing allows us to monitor virus mutation evolution over time and see if the virus is becoming more communicable or if it has developed resistance to antiviral medications.

For designing a vaccine or treatment, scientists require the genetic code of rapidly mutating viral surface protein which trigger the immune response. Sequencing will let us know which amino acids the virus uses for the construction of a protein to make a replica for triggering the immune response.

Sequencing can identify different variants; for example a normal strain from a mutated strain, therefore we detect changes quickly for early treatment of patients.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?

Nanopore Sequencing, which is a third-generation method, would be the technology I would choose because it can sequence extremely long DNA fragments and as data is analyzed instantly as it is passed into the pore, giving actionable insights for mutations occurring in viruses and pathogen surveillance. It also does not require PCR to amplify copies of DNA rapidly as it can read simultaneous molecular signals (MacKenzie and Argyropoulos, 2023).

My input will be genomic DNA or RNA taken from a viral sample.

Preparation of input: DNA is fragmented into the lengths required, but Nanopore sequencing can handle long, unfragmented reads.

The ends of the DNA will be repaired to make them blunt.

Adapter ligation will allow adaptors on the ends of the DNA to pull the DNA into the nanopore.

A tether molecule is added to help DNA find the pore on the sensor.

Nanopore sequencing is different in that it uses electricity instead of chemicals or light to detect bases.

The DNA is passed into a nanopore embedded in an electrically resistant membrane, and a persistent electrical current is passed through the pore. As the DNA strand moves through the pore and its data analysed, the bases A, T, C, G block the opening in a different way.

Each base has a different shape and size, it causes a unique disruption in the electrical current. Deep Learning can look at these electrical shifts throughout time to determine the bases sequence.

The output is an electrical signal file, commonly in .fast5 or .pod5 format.

The processed output of the squiggles is a FASTQ file. It contains the base sequences (A, T, C, G) and it will explain how confident the machine was about each base it read (MacKenzie and Argyropoulos, 2023).

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I would like to synthesise the DNA for the Combinatorial Nanobody Display Library (VHH domains) as in drug discovery, antibodies have unparalleled specificity, high binding strength to their target, and can modulate disease-related proteins, which positions them as directed, efficacious therapies in cancer medicine, immunology and infectious diseases. Viruses have antigenic markers which are distinct, localised parts of a foreign molecule which are directly recognised and bound by the immune system, specifically by antibodies, B cells or T cells of the immune system. Antibodies are, however large in size and difficult to reach these epitopes to neutralise the viral particle/cancer cell. This is where nanobodies which are small in size, can be highly resistant to heat, extreme pH and protein cleaving enzymes, can travel to concealed epitopes on viruses or cancer cells that larger human antibodies cannot reach (Muyldermans, 2021).

(ii) What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions:

What are the essential steps of your chosen sequencing methods? What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

Answer: The technologies I plan to use are Trinucleotide-Directed Mutagenesis to produce the diversity of the antibody by creating mutations in the Complementarity-Determining Regions whilst avoiding Termination Codons and Frameshift variants.

We will use PCR to generate diversity in the CDR3 region, a very diverse and flexible region, which touches the virus epitope, by creating different random combinations to reshuffle the DNA, and to create lots of nanobody genes.

Phages will be utilised to display the DNA on the surface, therefore we have the disease/viral/infectious proteins which will be mixed with the phages and phages exhibiting specificity and affinity for that protein will remain.

We will then use Next Generation sequencing and artificial intelligence to look at the left over phages and find better versions iteratively.

Essential steps:

  • We will fragment the nanobody, breaking it into its constituent parts: CDR1, CDR2, CDR3 and the framework regions.
  • We will do randomisation of the CDR3 part, using PCR we will use primers to insert random DNA bases at antigen binding sites.
  • The DNA parts have overlapping ends, so we stitch fragments together in the correct way.
  • We use DNA polymerase enzyme for DNA Amplification, creating a complete, double-stranded nanobody gene.
  • The DNA sequence must be recognised for the organism it grows in, so codon optimisation for the host organism is important.
  • We create mutations in select regions where we don’t create stop codons or frameshift mutations (Webmaster, 2024)

The Inputs:

DNA template: VHH gene.

Primers: Short, bespoke DNA strands. Some will match the VHH gene scaffold, whilst others will contain random mutations.

Enzymes: DNA Polymerase for building DNA and Restriction Enzymes to cut the DNA for plasmid insertion.

Plasmid vectors: the plasmid vector will carry the new library into cells.

E. coli cells: will help to replicate the nanobody gene library.

  1. Limitations of the Method
    1. E coli transformation efficiency - how much VHH DNA can be transfected.

    2. Bias of primers - we may get less diverse randomised DNA as the same bases can be chosen again and again.

    3. PCR DNA amplification may introduce errors whilst DNA is being replicated.

    4. Randomising DNA may introduce stop codons, therefore if they occur in the middle of the nanobody, the cell can stop halfway through, and we can end up with a useless fragment of a key.

5.3 DNA Edit

(i) What DNA would you want to edit and why?

In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

I would choose to edit out the highly conserved long terminal repeats which act as a promoter and regulatory sequences for viral transcription and replication and surround HIV’s viral genome which integrates into the host T CD4 helper cells, which help the adaptive arm of the immune system to recognize and destroy infected or cancerous cells, control immune responses and provide long-lasting immunity (Hu & Hughes, 2012). HIV is a viral disease that affects millions of people in all countries, and antiretroviral therapy and the immune system cannot easily detect the virus as it remains dormant in the host T cells. Furthermore, antiretroviral resistance may also occur over time, and strict medication adherence for patients must be followed to prevent HIV viral particle replication cycles (Hu and Hughes, 2012).

(ii) What technology or technologies would you use to perform these DNA edits and why?

I would use the CRISPR-Cas 9 gene editing system to edit out the Long Terminal Repeats which are identical and remove HIV from the CD4 T cells. They will stop the dormant, latent virus reservoir from growing and we can use one sgRNA to target these highly conserved LTR regions (Hu & Hughes, 2012), (Wang et al., 2018).

Also answer the following questions:

How does your technology of choice edit DNA? What are the essential steps?

CRISPR-Cas9 works as follows:

1 - We can make sgRNAs with a complementary base pair match to the LTR target sequence (at the 5’ and 3’ LTR ends of the integrated HIV genome) 2 - The sgRNAs will direct the Cas9 nuclease, which binds with guide RNA (gRNA), which directs it to particular DNA sequences within the HIV-1 LTR, producing double-strand breaks which leads to step 3. 3 - We will introduce indel mutations which will trigger the cell’s repair mechanisms (Non homologous end joining) which will inactivate the virus and disrupt viral transcription at the promoter region. This in turn prevents HIV viral replication (Asmamaw & Zawdie, 2021).

What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

  1. Preparation and design.

Firstly, we must find a conserved LTR region across different HIV strains to target with our sgRNAs and prevent host self DNA targeting and cleavage.

Secondly, the LTR target sequence region can mutate rapidly in HIV, therefore we should use a database with known mutant sequences and design our sgRNA for different mutant strains.

Thirdly, we will perform off target screening, to prevent damage to host self DNA, we must make sure the guide sequence does not match with a human gene (particularly near the PAM, which is important for the Cas9 nuclease to cleave the target DNA and is found 3-4 nucleotides downstream from the cut site).

We have to look into the LTR region which is composed of three subsegments, and find important areas important for viral transcription and replication therefore survival. Previous research has shown the U3, R and U5 regions, NF-κB binding sites (an important transcription factor which regulates cell growth and plays a role in cell development), the TATA box (highly conserved region found in the promoter region, and is a binding site for the TATA-binding protein, initiating transcription) and TAR (used with Tar protein to start viral replication) may be important (Wang et al., 2018).

We then must look for a PAM region.

Hairpin structures may prevent Cas nuclease binding to the sgRNA, therefore we can use insilico tools like Mfold to check if our sgRNA folds on itself (Asmamaw & Zawdie, 2021).

Physical inputs including ordering/creating components such as:

  • A custom sgRNA oligos with a 20-nt sequence to a vendor who can create a RNA / DNA template for it.
  • We will need a plasmid when using a cell culture (pLentiCRISPR v2) to put the sequence into the scaffold which carries the Cas9 nuclease and sgRNA complex.
  • I will make a pair of Forward and Reverse primers which be positioned outside of the target site.
  • After editing, we will do PCR. The primers will be placed before the 5’ LTR and one after the 3’ LTR. PCR will amplify millions of copies of solely the DNA between those two primers.
  • We will use gel electrophoresis and if the DNA travels right to the top it means this will be our unedited HIV DNA control and if successful our new band for our edited genome will be positioned much lower down on the gel (Asmamaw & Zawdie, 2021).
  1. Efficiency versus Precision

Efficiency: In a lab environment, results of CRISPR-Cas9 editing may be successful in cell culture, compared to the complexity of the living body, getting the Cas9 enzyme to cleave LTRs in the brain or bone marrow is still in its infancy due to its potential to induce off target cleavage and DNA damage leading to dangerous consequences.

Precision: Using an engineered Cas9 nuclease with an increased DNA targeting efficiency and minimal non-target toxicity whilst editing and which will only cut if the match is 100% perfect, significantly reduces host DNA damage risks in the human genome.

Since we have to target the 5’ and 3’ regions, it important to check that the guide sequence is present at both ends in order to excise the integrated HIV genome in the host CD4 T cells or other immune cells like macrophages (Hunt et al., 2023).

What are the limitations of your editing methods (if any) in terms of efficiency or precision?

Mutations may occur near the 3’ region in the seed region which is important for Cas9 nuclease recognition and binding to the target DNA, which could prevent Cas9/gRNA complex driven cleavage and suppression of viral replication.

References

Asmamaw, M., & Zawdie, B. (2021). Mechanism and applications of crispr/cas-9-mediated genome editing. Biologics : Targets & Therapy, 15, 353–361. https://doi.org/10.2147/BTT.S326422

Wang, G., Zhao, N., Berkhout, B., & Das, A. T. (2018). CRISPR-Cas based antiviral strategies against HIV-1. Virus Research, 244, 321–332. https://doi.org/10.1016/j.virusres.2017.07.020

Muyldermans, S. (2021). A guide to: Generation and design of nanobodies. The Febs Journal, 288(7), 2084–2102. https://doi.org/10.1111/febs.15515

Webmaster, I. (2024, November 18). Synthetic biology technologies for antibody discovery. Isogenica. https://isogenica.com/synthetic-biology-technologies-for-antibody-discovery/

MacKenzie, M., & Argyropoulos, C. (2023). An introduction to nanopore sequencing: Past, present, and future considerations. Micromachines, 14(2), 459. https://doi.org/10.3390/mi14020459

Hu, W.-S., & Hughes, S. H. (2012). Hiv-1 reverse transcription. Cold Spring Harbor Perspectives in Medicine, 2(10), a006882. https://doi.org/10.1101/cshperspect.a006882

Schütz, A., Bernhard, F., Berrow, N., Buyel, J. F., Ferreira-da-Silva, F., Haustraete, J., van den Heuvel, J., Hoffmann, J.-E., de Marco, A., Peleg, Y., Suppmann, S., Unger, T., Vanhoucke, M., Witt, S., & Remans, K. (2023). A concise guide to choosing suitable gene expression systems for recombinant protein production. STAR Protocols, 4(4), 102572. https://doi.org/10.1016/j.xpro.2023.102572

Hunt, J. M. T., Samson, C. A., Rand, A. du, & Sheppard, H. M. (2023). Unintended CRISPR-Cas9 editing outcomes: A review of the detection and prevalence of structural variants generated by gene-editing in human cells. Human Genetics, 142(6), 705–720. https://doi.org/10.1007/s00439-023-02561-1

Week 3 HW: Lab Automation

Assignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME!

A Blooming Daisy Flower PINK, PURPLE & BLUE DESIGN! :) image.png image.png

INITIAL DESIGN: image.png image.png

Python documentation

from opentrons import types

metadata = {
    'author': 'Tammy Sisodiya',
    'protocolName': ' HTGAA Dazzling Daisy',
    'description': 'A blooming Daisy flower in Purple, Pink, and Blue.',
    'source': 'HTGAA 2026 Opentrons Lab',
    'apiLevel': '2.20'
}

##############################################################################
###   Robot deck setup constants
##############################################################################

TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'

# UPDATED: Mapping the new lab colors to source wells
well_colors = {
    'A1' : 'Purple',
    'B1' : 'Pink',
    'C1' : 'Blue'
}

def run(protocol):
  # Tips
  tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips')

  # Pipettes
  pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])

  # Modules
  temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT)
  temperature_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul', 'Cold Plate')
  color_plate = temperature_plate

  # Agar Plate
  agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')
  center_location = agar_plate['A1'].top()
  pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)

  # Helper Functions
  def location_of_color(color_string):
    for well,color in well_colors.items():
      if color.lower() == color_string.lower():
        return color_plate[well]
    raise ValueError(f"No well found with color {color_string}")

  def dispense_and_detach(pipette, volume, location):
      assert(isinstance(volume, (int, float)))
      above_location = location.move(types.Point(z=location.point.z + 5))
      pipette.move_to(above_location)
      pipette.dispense(volume, location)
      pipette.move_to(above_location)

### YOUR DESIGN DATA ###
  sfgfp_points = [(-4.4, 26.4),(1.1, 26.4),(2.2, 26.4),(3.3, 26.4),(4.4, 26.4),(-6.6, 25.3),(-5.5, 25.3),(-4.4, 25.3),(-3.3, 25.3),(1.1, 25.3),(4.4, 25.3),(-12.1, 24.2),(-11, 24.2),(-9.9, 24.2),(-8.8, 24.2),(-7.7, 24.2),(-6.6, 24.2),(-2.2, 24.2),(0, 24.2),(4.4, 24.2),(-13.2, 23.1),(-12.1, 23.1),(-7.7, 23.1),(-6.6, 23.1),(-2.2, 23.1),(4.4, 23.1),(6.6, 23.1),(7.7, 23.1),(8.8, 23.1),(-13.2, 22),(-7.7, 22),(-6.6, 22),(-2.2, 22),(0, 22),(4.4, 22),(5.5, 22),(6.6, 22),(8.8, 22),(-13.2, 20.9),(-7.7, 20.9),(-2.2, 20.9),(3.3, 20.9),(4.4, 20.9),(9.9, 20.9),(-13.2, 19.8),(-7.7, 19.8),(-2.2, 19.8),(3.3, 19.8),(9.9, 19.8),(-13.2, 18.7),(-2.2, 18.7),(8.8, 18.7),(9.9, 18.7),(-13.2, 17.6),(-2.2, 17.6),(-1.1, 17.6),(8.8, 17.6),(-18.7, 16.5),(-17.6, 16.5),(-16.5, 16.5),(-15.4, 16.5),(-14.3, 16.5),(-12.1, 16.5),(-2.2, 16.5),(7.7, 16.5),(8.8, 16.5),(-20.9, 15.4),(-19.8, 15.4),(-18.7, 15.4),(-14.3, 15.4),(-13.2, 15.4),(-12.1, 15.4),(-2.2, 15.4),(-1.1, 15.4),(7.7, 15.4),(-20.9, 14.3),(-13.2, 14.3),(-12.1, 14.3),(-11, 14.3),(-3.3, 14.3),(-2.2, 14.3),(5.5, 14.3),(6.6, 14.3),(-20.9, 13.2),(-9.9, 13.2),(-8.8, 13.2),(-3.3, 13.2),(4.4, 13.2),(5.5, 13.2),(-20.9, 12.1),(-8.8, 12.1),(-7.7, 12.1),(-3.3, 12.1),(-2.2, 12.1),(2.2, 12.1),(3.3, 12.1),(5.5, 12.1),(6.6, 12.1),(7.7, 12.1),(8.8, 12.1),(9.9, 12.1),(11, 12.1),(-20.9, 11),(-19.8, 11),(-6.6, 11),(-5.5, 11),(-4.4, 11),(0, 11),(1.1, 11),(2.2, 11),(3.3, 11),(4.4, 11),(11, 11),(-19.8, 9.9),(-4.4, 9.9),(-3.3, 9.9),(-2.2, 9.9),(-1.1, 9.9),(0, 9.9),(11, 9.9),(-23.1, 8.8),(-22, 8.8),(-20.9, 8.8),(-19.8, 8.8),(-18.7, 8.8),(-17.6, 8.8),(-5.5, 8.8),(-4.4, 8.8),(-3.3, 8.8),(-2.2, 8.8),(-1.1, 8.8),(0, 8.8),(9.9, 8.8),(11, 8.8),(15.4, 8.8),(16.5, 8.8),(17.6, 8.8),(18.7, 8.8),(19.8, 8.8),(20.9, 8.8),(22, 8.8),(23.1, 8.8),(24.2, 8.8),(-23.1, 7.7),(-3.3, 7.7),(-1.1, 7.7),(0, 7.7),(1.1, 7.7),(2.2, 7.7),(8.8, 7.7),(9.9, 7.7),(14.3, 7.7),(15.4, 7.7),(16.5, 7.7),(20.9, 7.7),(22, 7.7),(24.2, 7.7),(-24.2, 6.6),(-23.1, 6.6),(-5.5, 6.6),(-4.4, 6.6),(-3.3, 6.6),(-2.2, 6.6),(1.1, 6.6),(2.2, 6.6),(7.7, 6.6),(8.8, 6.6),(9.9, 6.6),(11, 6.6),(12.1, 6.6),(13.2, 6.6),(14.3, 6.6),(16.5, 6.6),(19.8, 6.6),(20.9, 6.6),(23.1, 6.6),(-22, 5.5),(-9.9, 5.5),(-7.7, 5.5),(-6.6, 5.5),(-5.5, 5.5),(-4.4, 5.5),(-2.2, 5.5),(3.3, 5.5),(4.4, 5.5),(13.2, 5.5),(16.5, 5.5),(18.7, 5.5),(19.8, 5.5),(23.1, 5.5),(-20.9, 4.4),(-19.8, 4.4),(-18.7, 4.4),(-17.6, 4.4),(-16.5, 4.4),(-15.4, 4.4),(-14.3, 4.4),(-13.2, 4.4),(-12.1, 4.4),(-9.9, 4.4),(-8.8, 4.4),(-7.7, 4.4),(-2.2, 4.4),(4.4, 4.4),(5.5, 4.4),(6.6, 4.4),(13.2, 4.4),(16.5, 4.4),(17.6, 4.4),(18.7, 4.4),(23.1, 4.4),(-12.1, 3.3),(-11, 3.3),(-2.2, 3.3),(5.5, 3.3),(7.7, 3.3),(8.8, 3.3),(9.9, 3.3),(11, 3.3),(12.1, 3.3),(13.2, 3.3),(16.5, 3.3),(17.6, 3.3),(23.1, 3.3),(-13.2, 2.2),(-12.1, 2.2),(-2.2, 2.2),(6.6, 2.2),(9.9, 2.2),(16.5, 2.2),(17.6, 2.2),(18.7, 2.2),(19.8, 2.2),(20.9, 2.2),(22, 2.2),(-14.3, 1.1),(-2.2, 1.1),(7.7, 1.1),(9.9, 1.1),(11, 1.1),(15.4, 1.1),(16.5, 1.1),(22, 1.1),(-15.4, 0),(-14.3, 0),(-2.2, 0),(7.7, 0),(11, 0),(14.3, 0),(15.4, 0),(22, 0),(-15.4, -1.1),(-2.2, -1.1),(3.3, -1.1),(8.8, -1.1),(13.2, -1.1),(14.3, -1.1),(20.9, -1.1),(-15.4, -2.2),(-14.3, -2.2),(-13.2, -2.2),(-12.1, -2.2),(-8.8, -2.2),(-7.7, -2.2),(-6.6, -2.2),(-2.2, -2.2),(3.3, -2.2),(8.8, -2.2),(11, -2.2),(12.1, -2.2),(13.2, -2.2),(20.9, -2.2),(-14.3, -3.3),(-11, -3.3),(-8.8, -3.3),(-7.7, -3.3),(-6.6, -3.3),(-3.3, -3.3),(-2.2, -3.3),(3.3, -3.3),(4.4, -3.3),(8.8, -3.3),(11, -3.3),(19.8, -3.3),(20.9, -3.3),(-13.2, -4.4),(-12.1, -4.4),(-11, -4.4),(-8.8, -4.4),(-3.3, -4.4),(-2.2, -4.4),(4.4, -4.4),(5.5, -4.4),(6.6, -4.4),(7.7, -4.4),(8.8, -4.4),(13.2, -4.4),(19.8, -4.4),(-16.5, -5.5),(-15.4, -5.5),(-14.3, -5.5),(-13.2, -5.5),(-8.8, -5.5),(-4.4, -5.5),(-3.3, -5.5),(-2.2, -5.5),(4.4, -5.5),(14.3, -5.5),(15.4, -5.5),(16.5, -5.5),(18.7, -5.5),(19.8, -5.5),(-19.8, -6.6),(-18.7, -6.6),(-17.6, -6.6),(-16.5, -6.6),(-8.8, -6.6),(-4.4, -6.6),(-1.1, -6.6),(3.3, -6.6),(4.4, -6.6),(17.6, -6.6),(18.7, -6.6),(-23.1, -7.7),(-22, -7.7),(-20.9, -7.7),(-19.8, -7.7),(-17.6, -7.7),(-16.5, -7.7),(-15.4, -7.7),(-8.8, -7.7),(-7.7, -7.7),(-6.6, -7.7),(-5.5, -7.7),(-4.4, -7.7),(-3.3, -7.7),(-2.2, -7.7),(-1.1, -7.7),(0, -7.7),(2.2, -7.7),(3.3, -7.7),(16.5, -7.7),(17.6, -7.7),(-24.2, -8.8),(-23.1, -8.8),(-14.3, -8.8),(-13.2, -8.8),(-8.8, -8.8),(-7.7, -8.8),(-3.3, -8.8),(-2.2, -8.8),(0, -8.8),(1.1, -8.8),(2.2, -8.8),(3.3, -8.8),(5.5, -8.8),(14.3, -8.8),(15.4, -8.8),(16.5, -8.8),(-26.4, -9.9),(-25.3, -9.9),(-24.2, -9.9),(-12.1, -9.9),(-11, -9.9),(-9.9, -9.9),(-8.8, -9.9),(-3.3, -9.9),(0, -9.9),(7.7, -9.9),(8.8, -9.9),(11, -9.9),(12.1, -9.9),(13.2, -9.9),(-27.5, -11),(-26.4, -11),(-25.3, -11),(-24.2, -11),(-23.1, -11),(-22, -11),(-20.9, -11),(-19.8, -11),(-18.7, -11),(-17.6, -11),(-16.5, -11),(-15.4, -11),(-14.3, -11),(-13.2, -11),(-12.1, -11),(-11, -11),(-3.3, -11),(0, -11),(-28.6, -12.1),(-27.5, -12.1),(-19.8, -12.1),(-18.7, -12.1),(-17.6, -12.1),(-15.4, -12.1),(-12.1, -12.1),(-4.4, -12.1),(-3.3, -12.1),(-2.2, -12.1),(0, -12.1),(-28.6, -13.2),(-27.5, -13.2),(-20.9, -13.2),(-19.8, -13.2),(-12.1, -13.2),(-4.4, -13.2),(0, -13.2),(-26.4, -14.3),(-25.3, -14.3),(-13.2, -14.3),(-12.1, -14.3),(-5.5, -14.3),(-2.2, -14.3),(0, -14.3),(-23.1, -15.4),(-20.9, -15.4),(-19.8, -15.4),(-13.2, -15.4),(-8.8, -15.4),(-7.7, -15.4),(-6.6, -15.4),(-1.1, -15.4),(0, -15.4),(-18.7, -16.5),(-16.5, -16.5),(-15.4, -16.5),(-14.3, -16.5),(-13.2, -16.5),(-12.1, -16.5),(-9.9, -16.5),(-8.8, -16.5),(-2.2, -16.5),(0, -16.5),(-2.2, -17.6),(0, -17.6),(0, -18.7),(-1.1, -19.8),(1.1, -19.8),(-1.1, -20.9),(1.1, -20.9),(2.2, -20.9),(0, -22),(3.3, -22),(12.1, -22),(13.2, -22),(14.3, -22),(15.4, -22),(0, -23.1),(4.4, -23.1),(5.5, -23.1),(9.9, -23.1),(11, -23.1),(12.1, -23.1),(13.2, -23.1),(1.1, -24.2),(2.2, -24.2),(5.5, -24.2),(6.6, -24.2),(7.7, -24.2),(8.8, -24.2),(9.9, -24.2),(11, -24.2),(12.1, -24.2),(2.2, -25.3),(3.3, -25.3),(4.4, -25.3),(9.9, -25.3),(11, -25.3),(5.5, -26.4),(6.6, -26.4),(7.7, -26.4),(8.8, -26.4)]
  mrfp1_points = [(-15.4, 12.1),(-14.3, 12.1),(-14.3, 11),(-13.2, 11),(-12.1, 11)]
  mscarlet_i_points = [(-11, 20.9),(-9.9, 20.9),(-11, 19.8),(-9.9, 19.8),(-9.9, 18.7)]
  mko2_points = [(3.3, 18.7),(4.4, 18.7),(5.5, 18.7),(6.6, 18.7),(4.4, 17.6)]
  mjuniper_points = [(6.6, 9.9),(7.7, 9.9),(4.4, 8.8),(5.5, 8.8),(6.6, 8.8),(7.7, 8.8),(-6.6, 2.2),(-9.9, 1.1),(-8.8, 1.1),(-7.7, 1.1),(-6.6, 1.1)]
  electra2_points = [(1.1, 4.4),(1.1, 3.3),(1.1, 2.2),(1.1, 1.1)]

  # 2. UPDATED Design Mapping
  # Purple for the large petals, Pink for highlights, Blue for details.
  layers = [
      ('Purple', sfgfp_points),
      ('Pink', mrfp1_points),
      ('Pink', mscarlet_i_points),
      ('Pink', mko2_points),
      ('Blue', mjuniper_points),
      ('Blue', electra2_points)
  ]

  # 3. Execution Loop
  drop_vol = 1.0

  for color_name, points in layers:
      if not points:
          continue

      source_well = location_of_color(color_name)

      for i in range(0, len(points), 15):
          chunk = points[i:i + 15]
          pipette_20ul.pick_up_tip()

          aspirate_vol = (len(chunk) * drop_vol) + 2.0
          if aspirate_vol > 20.0:
              aspirate_vol = 20.0

          pipette_20ul.aspirate(aspirate_vol, source_well)

          for x, y in chunk:
              if (x**2 + y**2) < 1600:
                  target_point = center_location.point + types.Point(x=x, y=y, z=0)
                  target_loc = types.Location(target_point, None)
                  dispense_and_detach(pipette_20ul, drop_vol, target_loc)

          # Return residual to source well top to avoid contamination
          if pipette_20ul.current_volume > 0:
              pipette_20ul.dispense(pipette_20ul.current_volume, source_well.top())

          pipette_20ul.drop_tip()

Post-Lab Questions — DUE BY START OF FEB 24 LECTURE

  1. I found a paper which utilises Opentrons-2 liquid handling to mix and set up protein crystallization plates using Hen Egg White Lysozyme (HEWL) as a Model System and validation of the robot’s capabilities and Periplasmic Protein (which is used as a framework/scaffold for large batches of similarly structured crystals). Protein crystallization aids in 3D structural determination of proteins using X-ray crystallography. If we know the protein-protein interactions and protein function details, we can use this to drive the design, modelling and optimization of new drugs which fit in the protein’s active site. The paper showed how researchers automated the 24-well sitting drop plates setup for these two specific proteins for protein crystallization, and compared to manual human pipetting of plates, the Opentrons robot decreased manual labour, increased reliability of pipetting, but potentially decreased variability across plates from person to person (DeRoo et al., 2025).

2) Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

For my idea 1: The Opentrons protocol can be used in a 96-well plate to mix different PETase enzyme mutants / variations of this enzyme with 5 different plastic types.

A random idea I thought of initially as a small side project as part of my original idea 1: Measure PETase degradation rate

I would like to check for stable PETase variants, by using Ginkgo Nebula to design a library of between 50-100 PETase mutants, specifically targeting the W159H/S238F double mutation which has been shown to improve stabilisation and binding affinity to PET in Ideonella sakaiensis PETase (Austin et al., 2018).

The steps I will take to get them sequenced and produced by the foundry: 1) ensuring codon optimisation for the DNA sequence variants and 2) get 96-well plates or expression ready plasmids.

I will hope to develop 3D printed lab equipment, such as a PETase film clamping insert, which will allow PETase films to not float and keep films submerged, so the Opentrons pipette will deposit the enzyme directly on the surface.

I would then like to automate the measurement of how fast PETase degrades plastic - optimising the conditions for PETase activity, as it is dependent on temperature, pH and NaCl concentration. We can use a multiplexed bioreactor array with precise control over pH, temperature, oxygen transfer, and nutrient feeding in its chambers, to allow lots of parallel experiments (cell cultures) to be conducted all at the same time. I would like to use hardware such as Arduino / ESP32 to administer 4–8 small vials, each with its own heating element and pH probe. We use a loop to keep the pH continuous: When PETase breaks down PET into its monomers; terephthalic acid and ethylene glycol, the pH will become more acidic. The automation system will measure how much alkalinity like adding NaOH it has to pump in to neutralize the acid. The quantity of the alkaline base added will become a substitute for the plastic degradation rate, producing an output of a graph.

For my idea 2: Protein MPNN generated phage tail sequences will be stored in Benchling and annotated, top 5 highly ranked phage tail sequences will be run into Nebula’s codon optimisation tool for bacteria. To test engineered phages against non target and target bacteria, I’d use a phage spotting assay, for which I’d utilise Opentrons protocol to automate this.

For my idea 3: Creating a GFP-based cancer mutation biosensor, I will use Benchling to create the genetic construct (GFP-antibody fragment/nanobody fusion, promoter, linker sequences), Ginkgo Nebula to codon optimise and synthesise the sequence and an Opentrons protocol which will automate the 96-well plate fluorescence assay, which I will use mutant and non-mutant KRAS genes for exposing to the GFP biosensor.

Final Project Ideas — DUE BY START OF FEB 24 LECTURE

BTS: Inspirations, content and why I chose these 3 project ideas?

My inspiration for the first idea, the PETase substrate specificity in silico project idea was to understand how different plastics apart from PET could be degraded by the enzyme, and how inducing mutations in the stabilisation sites of the enzyme may increase binding affinity for the plastic. Generally, I am very interested in the topic of bioremediation for plastic pollution, which is widespread.

My inspiration for the second idea, was that phages use receptor binding proteins to bind to bacteria, and as they have a limited host range due to the high receptor specificity for specific bacterial organelles, we can engineer the receptor binding loops in the RBP region with swapping similar genes between phages, and through insertions, mutations and recombining genes to change binding affinity to different bacterial surface proteins and broaden host phage range.

My inspiration for the third idea was brought about my initial meeting at at Lifefabs using phytoplanktons as water quality biosensors joined with the idea of learning at the BioBootcamp and in HTGAA classes of GFP as glowing if mutations are detected which would be a biological readout of an active molecular detector to prevent the development of cancer downstream.

References

DeRoo, J. B., Jones, A. A., Slaughter, C. K., Ahr, T. W., Stroup, S. M., Thompson, G. B., & Snow, C. D. (2025). Automation of protein crystallization scaleup via Opentrons-2 liquid handling. SLAS Technology, 32, 100268. https://doi.org/10.1016/j.slast.2025.100268

Austin, H. P., Allen, M. D., Donohoe, B. S., Rorrer, N. A., Kearns, F. L., Silveira, R. L., Pollard, B. C., Dominick, G., Duman, R., El Omari, K., Mykhaylyk, V., Wagner, A., Michener, W. E., Amore, A., Skaf, M. S., Crowley, M. F., Thorne, A. W., Johnson, C. W., Woodcock, H. L., … Beckham, G. T. (2018). Characterization and engineering of a plastic-degrading aromatic polyesterase. Proceedings of the National Academy of Sciences, 115(19). https://doi.org/10.1073/pnas.1718804115

Week 4 HW: Protein Design Part I

Part A. Conceptual Questions

  1. How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

6.022 x10^23 molecules of amino acids.

  1. Why do humans eat beef but do not become a cow, eat fish but do not become fish? Humans obtain amino acids from the food they eat, it gets broken down and digested by the human, incorporated into human tissues and used for energy consuming, and leftover parts are excreted as normal.

  2. Why are there only 20 natural amino acids?

There are only 20 amino acids because there was only a limited choice of C, H, N, O and S atoms which can form limited functional groups on Earth, approximately 4 billion years ago. These were also selected because of their favourable solubility, folding and stability - which takes into account the amount of energy and resources the protein needs to be synthesised (Doig, 2017).

  1. Can you make other non-natural amino acids? Design some new amino acids.

Design considerations I would take:

  • I would like to produce a therapeutic non natural AA which would extend a drug’s half life.
  • I would like my amino acid to be in the D-isomer form, as the L-isomer form is easily digested by the body.
  • I’ll use a CH3 group and add it to the alpha-carbon to protect it from being enzymatically degraded, and so it remains in the bloodstream.
  • I’ll add membrane permeability to increase bioavailability and absorption of the amino acid drug; by first creating a prodrug form of it, but also using hydrophobic groups to increase permeability.
  • Research making non natural AA’s shows when fluorine replaces carbon, it can increase its stability in the body without making the amino acid too bulky.

My idea is the target GLP-1 which Ozempic, the weight loss and diabetes metabolic drug is used to target. Usually, GLP-1 is degraded by the enzyme DPP-4 quickly in the bloodstream. I’d like to add a alpha-methylated NAA or a D-amino acid at the DPP-4 cleavage sites. By joining the peptide with an NAA like alpha-methyl-L-phenylalanine, I will extend the half-life of the drug from minutes to over a week, which would make it as long term treatment for metabolic disorders (Kohnke & Zhang, 2026).

  1. Where did amino acids come from before enzymes that make them, and before life started? Amino acids were produced by abiogenesis: gases like methane, nitrogen etc could react with lightning (electricity) to produce amino acids. They may have been found in meteorites and may be produced in reactions taking place in extreme environments like deep sea hydrothermal vents to produce amino acids (Cowing, 2023).

  2. If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? It will form a left-handed helix.

  3. Can you discover additional helices in proteins?

Zhang and Egli (2021) found ways of using chemical classification methods to discover additional helices in proteins. This was based on the composition of the helices: whether they have water-hating and water-loving amino acid types and their water solubility. They are organised as structurally identical α-helices into three chemically differing types: Type I, hydrophilic α-helix; Type II, hydrophobic α-helix; Type III, amphiphilic α-helix. The QTY code is used to use the structural similarities between specific polar and non-polar amino acids to change proteins. By replacing water hating amino acid residues with chemically similar water loving amino acid residues, water-insoluble membrane proteins can be changed into soluble modifications without losing their initial structure or function. This framework provides a simplified “molecular code” for protein design, comparable to the base-pairing rules of DNA.

  1. Why are most molecular helices right-handed?

Left hand helices are not as energetically favourable or form stable helices, as the right hand conformation promotes increased hydrogen bonding of the peptide chain backbone. According to evolution and adaptation, for chirality purposes, we use L-amino acids and D-sugars which favour and create the right handed conformation. As the helices fold, the right handed conformation is trapped like this due to an unusually high unfolding barrier which causes extremely reduced protein unfolding rates (Manning & Colón, 2004). Moreover, steric hindrance, which prevents a chemical reaction occuring or slows it down due larger chemical groups obstructing access to the reactive site, can occur in the left handed conformation therefore the right handed conformation is favoured.

  1. Why do β-sheets tend to aggregate?

Beta-sheets aggregrate due to the hydrophobic side chains sticking out and intermolecular hydrogen bonding between the peptide backbone and the neighbouring strand. This causes stacking and increases stability of the beta sheets causing aggregation.

  • What is the driving force for β-sheet aggregation?

The driving force is the Intermolecular hydrogen bonding between the peptide backbone of one strand to its corresponding strand and hydrophobic side chains protruding out therefore building a highly stable and rigid structure which aggregrates.

  1. Why do many amyloid diseases form β-sheets?

In neurodegenerative diseases like Alzheimer’s, Parkinson’s, prion diseases and endocrine disorders like Type II diabetes the Beta-sheets form proteins which do not have their normally functional, 3D, native structure, leading to cellular aggregation, toxicity and disease, as in the formation of plaques in these diseases causing cognitive and memory deficits. The amyloid diseases cause flattened fibrils which produce plaques which are extremely insoluble, stable and strong due to the Beta-sheet structures ability to form hydrophobic side chains which stick out and the peptide protein backbone hydrogen bonding with the neighbouring strands. The beta sheets insoluble, stable structure prevents protease degradation to degrade and recycle these proteins.

  • Can you use amyloid β-sheets as materials? Due to amyloid β-sheets having properties such as being highly rigid, stable, increased adsorption capacity they can be used for bioremediation to capture water contaminants such as toxins and heavy metals, to produce biodegradable and sustainable bioplastics, and making protein based biogels and matrices for the 3D tissues bioprinting.
  1. Design a β-sheet motif that forms a well-ordered structure. XX (I would like to attempt this later!) XX

Part B: Protein Analysis and Visualization

1. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions. Briefly describe the protein you selected and why you selected it.

I chose the p53 protein, which triggers programmed cell death when ailments like cancer cause extensive DNA damage from oxidative stress like UV light, oxygen radicals or chemicals. In a cancerous cell, the p53 protein will travel to the nucleus and signal the mitochondria to release reactive oxygen species or increase calcium levels. Other death factors released include cytochrome c, which activates caspases and SMAC which blocks survival proteins (Fogg et al., 2011). I selected this protein as mutations in this protein can cause cancer and it is vital to protect the human genome from damage .

2. Identify the amino acid sequence of your protein. How long is it? What is the most frequent amino acid? You can use this notebook to count most frequent amino acid - https://colab.research.google.com/drive/1vlAU_Y84lb04e4Nnaf1axU8nQA6_QBP1?usp=sharing

p53 is 393 amino acids long.

The most common amino acid is: P (Proline), which appears 45 times.

How many protein sequence homologs are there for your protein? Hint: Use the pBLAST tool to search for homologs and ClustalOmega to align and visualize them.

I found 175 total homologs using the p53 human version (https://www.uniprot.org/uniprotkb/P04637/entry).

My Cluster alignment sequences are below:

CLUSTAL O(1.2.4) multiple sequence alignment


Zebrafish_P53      -------------MAQNDSQEFAELWEKN----LISIQPPGGGSCWDII-----NDEEYL	38
Frog_P53           ----MEPSSETGMDPPLSQETFEDLWSLLPDPLQTVTCR-------------LDNLSEFP	43
Human_P53          ---MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLP---SQAMDDLMLSPDDIEQWF	54
Mouse_P53          MTAMEESQSDISLELPLSQETFSGLWKLLPPEDILPS-----PHCMDDLLL-PQDVEEFF	54
Rat_P53            ---MEDSQSDMSIELPLSQETFSCLWKLLPPDDILPTTATGSPNPMEDLFL-PQDVAELL	56
                                    ..: *  **.                           :  :  

Zebrafish_P53      ---PGSFDPNFFG-NV-----LEEQP------QPSTLPPTSTVPETSDYPGDHGFRLRFP	83
Frog_P53           D-YPLAADMTV------LQ--------EGLMGNAVPTVTSCAVPSTDDYAGKYGLQLDFQ	88
Human_P53          TEDPGPDEAPRMPEAAPPVAPAPATPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFL	114
Mouse_P53          E---GPSEALRVSGAPAAQDPVTETPGPVAPAPATPWPLSSFVPSQKTYQGNYGFHLGFL	111
Rat_P53            E---GPEEALQVS-APAAQEPGTEAPAPVAPASATPWPLSSSVPSQKTYQGNYGFHLGFL	112
                          :                               :. **. . * *.:*::* * 

Zebrafish_P53      QSGTAKSVTCTYSPDLNKLFCQLAKTCPVQMVVDVAPPQGSVVRATAIYKKSEHVAEVVR	143
Frog_P53           QNGTAKSVTCTYSPELNKLFCQLAKTCPLLVRVESPPPRGSILRATAVYKKSEHVAEVVK	148
Human_P53          HSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVR	174
Mouse_P53          QSGTAKSVMCTYSPPLNKLFCQLVKTCPVQLWVSATPPAGSRVRAMAIYKKSQHMTEVVR	171
Rat_P53            QSGTAKSVMCTYSISLNKLFCQLAKTCPVQLWVTSTPPPGTRVRAMAIYKKSQHMTEVVR	172
                   :.****** ****  ***:****.****: : *   ** *: :** *:**:*:*::***:

Zebrafish_P53      RCPHHERTP-DGDNLAPAGHLIRVEGNQRANYREDNITLRHSVFVPYEAPQLGAEWTTVL	202
Frog_P53           RCPHHERSVEPGEDAAPPSHLMRVEGNLQAYYMEDVNSGRHSVCVPYEGPQVGTECTTVL	208
Human_P53          RCPHHERCS-DSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIH	233
Mouse_P53          RCPHHERCS-DGDGLAPPQHLIRVEGNLYPEYLEDRQTFRHSVVVPYEPPEAGSEYTTIH	230
Rat_P53            RCPHHERCS-DGDGLAPPQHLIRVEGNPYAEYLDDKQTFRHSVVVPYEPPEVGSDYTTIH	231
                   *******    .:. **  **:*****    * :*  : **** **** *: *:: **: 

Zebrafish_P53      LNYMCNSSCMGGMNRRPILTIITLETQEGQLLGRRSFEVRVCACPGRDRKTEESNFKKDQ	262
Frog_P53           YNYMCNSSCMGGMNRRPILTIITLETPQGLLLGRRCFEVRVCACPGRDRRTEEDNYTKKR	268
Human_P53          YNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG	293
Mouse_P53          YKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKE	290
Rat_P53            YKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKE	291
                    :***********************  .* **** .*************:***.*  *. 

Zebrafish_P53      ETKTMAKTTTGTKRSLVKESSSATLRPEGSKKAKGSSSDEEIFTLQVRGRERYEILKKLN	322
Frog_P53           GLKPS------GKRELAHPPS---SEPPLPKKRLVVDDDEEIFTLRIKGRSRYEMIKKLN	319
Human_P53          EPHHELPPGS-TKRALPNNTS---SSPQPKKK----PLDGEYFTLQIRGRERFEMFRELN	345
Mouse_P53          VLCPELPPGS-AKRALPTCTS---ASPPQKKK----PLDGEYFTLKIRGRKRFEMFRELN	342
Rat_P53            EHCPELPPGS-AKRALPTSTS---SSPQQKKK----PLDGEYFTLKIRGRERFEMFRELN	343
                               ** *    *     *   **      * * ***:::**.*:*::::**

Zebrafish_P53      DSLELSDVVPASDAEKYRQKFMTKNKKENRGSSEPKQGKKLMVKDEGRSDSD	374
Frog_P53           DALELQESLDQQKVTI--------KCRKCRDEIKPKKGKKLLVKDEQPDSE-	362
Human_P53          EALELKDAQAGKEPGGSRAHS---SHLKSKKGQSTSRHKKLMFKTEGPDSD-	393
Mouse_P53          EALELKDAHATEESGDSRAHS---SYLKTKKGQSTSRHKKTMVKKVGPDSD-	390
Rat_P53            EALELKDAHAAEESGDSRAHS---SYPKTKKGQSTSRHK-PMIKKVGPDSD-	390
                   ::***.:    ..           .  : :   . .: *  :.*    ... 

Does your protein belong to any protein family?

The protein belongs to the p53 family.

Identify the structure page of your protein in RCSB When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

The structure for 1TUP (https://www.rcsb.org/structure/1TUP) was solved and made public in 1995. It is 2.20 Å, higher resolution, so it is a high quality structure.

Are there any other molecules in the solved structure apart from protein?

DNA which is bound and complexed with p53.

Does your protein belong to any structure classification family?

The 1TUP structure represents the core domain of the p53 tumour suppressor family in complex with its target DNA binding site.

3D molecule visualizations

CARTOON Cartoon_1TUP.png Cartoon_1TUP.png

RIBBON Ribbon_1TUP.png Ribbon_1TUP.png

BALL & STICK Ball&Stick_1TUP.png Ball&Stick_1TUP.png

Color the protein by secondary structure. Does it have more helices or sheets? 2NDARY STRUCTURE

SecondaryStructure_1TUP.png SecondaryStructure_1TUP.png

Colour key: Beta-Sheets (Arrows) = YELLOW Alpha-Helices (Spirals) = RED Loops = GREEN Nucleic acid = CYAN

It has more beta sheets than helices, consisting of two opposing antiparallel β-sheets.

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues? Residuetype_1TUP.png Residuetype_1TUP.png

1TUP has more hydrophilic residues (coloured in white) than hydrophobic residues (coloured in orange).

Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

Yes, 1TUP has many binding pockets. 1) It has a DNA Binding cleft, 2) it has a zinc cofactor binding pocket important for its shape, 3) it has hydrophobic pockets, which drugs can target.

Part C. Using ML-Based Protein Design Tools

C1. Protein Language Modeling

1a. 2LZM (Bacteriophage T4 Lysozyme)

newplot.png newplot.png

1b. Key residues/stretches of residues that stood out to me: A Protein Language Model mutation scanning result was coloured entirely yellow/green in Position 51, Mutation to Tyrosine [Y], Score: 3.037 has the highest score in the entire column and row. The high score indicates the mutation has a positive stabilising impact on the enzyme stability and increases its enzymatic activity. It will increase its rigidity, ensure it is well compacted and improve the enzyme’s function.

In the Proline (P) row, the score becomes increasingly negative indicating a deleterious mutation, compared to the Alanine (A) or Serine (S) rows which indicates the protein may not able to fold in the right way, or it has lost its stability completely that it is likely to be degraded by the proteasome, the protein recycling and breaking down machine.

1c. (Bonus) Find sequences for which we have experimental scans, and compare the prediction of the language model to experiment.

Rennell et al. (1991) conducted a deep mutation scan which found that the Position 51, Mutation to Tyrosine [Y] was a stabilising mutation. Similarly, mutation from any amino acid to proline causes a Misfolded/Degraded protein in the P row, and the proline row is have a highly negative score throughout as replicated by the protein language model.

Latent Space Analysis

Use the provided sequence dataset to embed proteins in reduced dimensionality newplot (4).png newplot (4).png

Found target at index 13819: d1k9oi_ e.1.1.1 (I:) Alaserpin (serpin 1) {Tobacco hornworm (Manduca sexta) [TaxId: 7130]}
--- Neighborhood of d1k9oi_ e.1.1.1 (I:) Alaserpin (serpin 1) {Tobacco hornworm (Manduca sexta) [TaxId: 7130]} ---
Dist: 0.00 | ID: d1k9oi_ e.1.1.1 (I:) Alaserpin (serpin 1) {Tobacco hornworm (Manduca sexta) [TaxId: 7130]}
Dist: 0.14 | ID: d2wqfa_ d.90.1.0 (A:) automated matches {Lactococcus lactis [TaxId: 1358]}
Dist: 0.22 | ID: d1q2la1 d.185.1.1 (A:504-732) Protease III {Escherichia coli [TaxId: 562]}
Dist: 0.23 | ID: d2b0ta_ c.77.1.2 (A:) automated matches {Corynebacterium glutamicum [TaxId: 1718]}
Dist: 0.26 | ID: d5jhxa2 a.93.1.0 (A:476-784) automated matches {Magnaporthe oryzae [TaxId: 242507]}
Dist: 0.26 | ID: d1q2la2 d.185.1.1 (A:733-960) Protease III {Escherichia coli [TaxId: 562]}
Dist: 0.27 | ID: d1nhpa3 d.87.1.1 (A:322-447) NADH peroxidase {Enterococcus faecalis [TaxId: 1351]}
Dist: 0.30 | ID: d7c6ca_ d.2.1.0 (A:) automated matches {Bacillus subtilis [TaxId: 224308]}
Dist: 0.32 | ID: d1q2la3 d.185.1.1 (A:264-503) Protease III {Escherichia coli [TaxId: 562]}
Dist: 0.34 | ID: d1dpga2 d.81.1.5 (A:182-412,A:427-485) Glucose 6-phosphate dehydrogenase {Leuconostoc mesenteroides [TaxId: 1245]}

Analyze the different formed neighborhoods: do they approximate similar proteins?

Yes it does but with some variation. My 3D t-SNE analysis shows that T4 Lysozyme (2LZM) neighbourhoods approximate functional and structural archetypes than simply just identical sequences. Neighbours include include Protease III, Alaserpin and NADH peroxidase involves in regulatory and degradation functions, similar to bacteriophage T4 lysozyme which is involved in degrading bacterial peptidoglycan cell walls. Structurally, they all contain high alpha helical content, similar surface charge spread out or specific pocket geometries required for substrate binding.

Place your protein in the resulting map and explain its position and similarity to its neighbors.

The T4 lysozyme protein sits in a cluster of highly ordered, multi-domain globular shaped proteins. In the 3D plot, it’s away from the “Globin” cluster (the $a.1.1.1$ group we saw earlier) and instead occupies a space reserved for metabolic and regulatory machinery. Many of the neighbors (like Protease III and Alaserpin) are involved in cleavage or inhibition of bonds. As Y4 Lysozyme’s job breaks down bacterial cell walls by hydrolysing the peptidoglycan bond. Hosts are shared, we can see the hosts are from E. coli , Lactococcus lactis, and Enterococcus faecalis, all bacteria. Since T4 lysozyme is a Bacteriophage enzyme (which attacks these specific bacteria), it shares a biochemical language with the proteins of its prey.

C2. Protein Folding

Fold your protein with ESMFold. Do the predicted coordinates match your original structure?

Output: Total sequence length: 164 Running ESMFold inference for sequence with length 164… Prediction complete. ptm: 0.889 plddt: 93.913

ESMFold predicted coordinates for Phage Y4 Lysozyme show a high structural resemblance to the X-ray crystallography structure (PDB 2LZM) with an TM score > 0.9.

Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations? It is resilient to mutations. I tried a well known mutation : leucine to alanine which changes a bulky amino acid to a small one, and this demonstrated no effect on the stability and the globular 3D shape of the protein.

C3. Protein Generation

Inverse-Folding a protein: Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN Analyze the predicted sequence probabilities and compare the predicted sequence vs the original one. Input this sequence into ESMFold and compare the predicted structure to your original. Length of chain A is 164 of T4 Lysozyme.

2LZM, score=1.1319, fixed_chains=[], designed_chains=[‘A’], model_name=v_48_020 MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNCNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL T=0.1, sample=0, score=0.7766, seq_recovery=0.6098 MNIYEMLKILEGLRLKIYKDRYGNYTIGIGHLLTKDPSLEKAKAKLDEAIGRKTNGVITEEEANELFEKDVKAAIEAIKNNPVLKPVYDSLDEVRKMALIALVFRMGEKGVAGLKETLALLKEGKWDEAAETLKKSRWYKNEPENAEKIITMFKTGTFEAFEED

Amino acid probabilities newplot (6).png newplot (6).png

Sampling temperature adjusted amino acid probabilties

newplot (7).png newplot (7).png
FeatureResult observed
Sequence recovery60.98% (High similarity to native T4 Lysozyme).
Probability analysisSharp peaks at the catalytic core; high entropy on surface loops.
Mutational strategyMostly conservative (hydrophobic-to-hydrophobic) swaps.
Structural matchPredicted to be nearly identical to 2LZM (RMSD < 1.0Å).
ConclusionThe T4 Lysozyme backbone is a highly designable scaffold.

There is high certainty for residues in the hydrophobic core and active site cleft. I had 60.98% sequence identity between the ProteinMPNN sequence candidate and the native Y4 sequence, but hydrophobic residues and surface charge remained conserved across both.

Part D. Group Brainstorm on Bacteriophage Engineering

Question 3) I am addressing the project goals 1) Higher toxicity of lysis protein (hard), and mixing in elements of Increased stability (easiest) as a computational game plan.

Goal 1: I want to disrupt the interaction between MS2-L and E. coli DnaJ. The MS2-L protein depends on the DnaJ for lysis, which functions in regulation to slow down lysis and allows for virus particle construction. By disrupting the interaction, I’d like to increase bacterial lysis and also maximise the burst size or enhance the phage lytic cycle.

Goal 2: Increase toxicity of the lysis protein: Optimise the essential C-terminal lytic domain (Domains 2–4) and the conserved LS motif (Leu48-Ser49) to increase its membrane rupture thus cell lytic abilities.


Tools and approaches I would like to use

Protein Language Models (ESM2) for In Silico deep mutational scan: ESM2 can be used to make a mutational scan of the MS2-L sequence. We can see amino acids which are important in the N-terminal domain which are important for DnaJ binding and decipher the amino acids in the C-terminal domain which are more responsive to change.

ESMFold will be used for Structure Prediction of non-mutated MS2-L and its variants to ensure that mutations intended to disrupt DnaJ binding do not compromise the overall fold or the predicted alpha-helical transmembrane domain.

ProteinMPNN to remodel the DnaJ-N-terminal domain to reduce inhibitory and optimise the C-terminal domain for enhanced membrane binding.


Justification for tool selection

ESM2 will scan for the conserved LS motif without knowing a established structure, because it relies on evolutionary patterns.

ProteinMPNN can make sequences which are more stable and will fit a certain backbone than natural sequences.

ESMFold predicts 3D atomic-level protein structures directly from their amino acid sequences., ensuring that they maintain the necessary alpha-helical structure (residues 37 to the C-terminus) required for membrane insertion.


Potential Pitfalls

  • Higher order Oligomerization of MS2-L - computational folding tools only look at monomeric / multimeric structures, so it is hard predicting how mutations will affect oligomerization of MSL-2.

Pipeline Schematic Input: Non mutated MS2-L sequence and a predicted alpha-helical backbone.

Analysis: ESM2 Deep Mutational Scan to check DnaJ-binding amino acid residues in the N-terminus and important lytic amino acid residues in the C-terminus.

Design: ProteinMPNN remodelling of the N-terminal domain to stop inhibitory interactions and optimise the C-terminal domain for enhanced membrane binding.

Filter: ESMFold structural validation to discard candidates that fail to maintain the predicted transmembrane helix. Output: Top candidates for experimental testing in a cell-free expression system or in vivo.

References

Fogg, V. C., Lanning, N. J., & MacKeigan, J. P. (2011). Mitochondria in cancer: At the crossroads of life and death. Chinese Journal of Cancer, 30(8), 526–539. https://doi.org/10.5732/cjc.011.10018

Rennell, D., Bouvier, S. E., Hardy, L. W., & Poteete, A. R. (1991). Systematic mutation of bacteriophage T4 lysozyme. Journal of Molecular Biology, 222(1), 67–88. https://doi.org/10.1016/0022-2836(91)90738-r

Manning, M., & Colón, W. (2004). Structural basis of protein kinetic stability: Resistance to sodium dodecyl sulfate suggests a central role for rigidity and a bias toward β-sheet structure. Biochemistry, 43(35), 11248–11254. https://doi.org/10.1021/bi0491898

Doig, A. J. (2017). Frozen, but no accident – why the 20 standard amino acids were selected. The FEBS Journal, 284(9), 1296–1305. https://doi.org/10.1111/febs.13982

Zhang, S., & Egli, M. (2022). Hiding in plain sight: Three chemically distinct α-helix types. Quarterly Reviews of Biophysics, 55, e7. https://doi.org/10.1017/S0033583522000063

Kohnke, P., & Zhang, L. (2026). Expedient synthesis of n -protected/ c -activated unnatural amino acids for direct peptide synthesis. Journal of the American Chemical Society, 148(5), 5615–5622. https://doi.org/10.1021/jacs.5c20374

Cowing, K. (2023, April 5). How were amino acids formed before the origin of life on earth? Astrobiology. https://astrobiology.com/2023/04/how-were-amino-acids-formed-before-the-origin-of-life-on-earth.html

Chamakura, K. R., Edwards, G. B., & Young, R. (2017). Mutational analysis of the MS2 lysis protein L. Microbiology, 163(7), 961–969. https://doi.org/10.1099/mic.0.000485

Chamakura, K. R., Tran, J. S., & Young, R. (2017). MS2 lysis of Escherichia coli depends on host chaperone DnaJ. Journal of Bacteriology, 199(12), Article e00058-17. https://doi.org/10.1128/JB.00058-17

Mezhyrova, J., Martin, J., Börnsen, C., Dötsch, V., Frangakis, A. S., Morgner, N., & Bernhard, F. (2023). In vitro characterization of the phage lysis protein MS2-L. Microbiology Research Resources, 2(4), Article 28. https://doi.org/10.20517/mrr.2023.28

Strathdee, S. A., Hatfull, G. F., Mutalik, V. K., & Schooley, R. T. (2023). Phage therapy: From biological mechanisms to future directions. Cell, 186(1), 17–31. https://doi.org/10.1016/j.cell.2022.11.017

Gemini AI, AI prompt, How to use KNN neighbours to find neighbourhoods, and debugging code

Week 5 HW: Protein Design Part II

Part 1: Generate Binders with PepMLM

  1. Changed 5th character from A to V.
>sp|P00441|SOD1_HUMAN_A4V Superoxide dismutase [Cu-Zn] OS=Homo sapiens OX=9606 GN=SOD1 PE=1 SV=2
MATK**V**VCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTS
AGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVV
HEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

2.Screenshot 2026-03-08 at 13.07.08.png Screenshot 2026-03-08 at 13.07.08.png

Screenshot 2026-03-08 at 13.26.38.png Screenshot 2026-03-08 at 13.26.38.png

Sequence 5 TKGNAGSRLACG WRGDDSVDFEGR 18.929941

Part 2: Evaluate Binders with AlphaFold3

  1. Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?

Protein-peptide sequence 1: SSTLRLFAQLRR, ipTM = 0.55, pTM = 0.67 The low ipTM of 0.3 suggests the peptide is surface-bound, and is roughly associated with the outer beta barrel part. It is not buried and is situated on the exterior surface of the Alphafold model, and does not situate near the N-terminus where the A4V mutation sits.

Protein-peptide sequence 2: GTGTASGALLLK: ipTM = 0.52, pTM = 0.68 The low ipTM of 0.3 suggests the peptide is surface-bound, and is roughly associated with the outer beta barrel part. It is not buried and is situated on the exterior surface of the Alphafold model, and does not situate near the N-terminus where the A4V mutation sits. Higher iPTM because of hydrophobic amino acid residues being aligned between the protein-peptide complex.

Protein-peptide sequence 3: ATVGVVQVIICN: ipTM = 0.64, pTM = 0.71 The higher ipTM of 0.64, shows extremely high confidence in the A4V SOD1 structural fold. Peptide looks to be wrapping around the beta-barrel and surface bound. Localisation wise, it isn’t directly touching the N-terminal A4V site. It contacts with the middle of the protein sequence.

Protein-peptide sequence 4: WRGDDSVDFEGR, ipTM = 0.56, pTM = 0.65 The ipTM is over 0.5 suggesting good structural stability, it appears surface bound, does not localise near the N-terminus and is not buried deep within the hydrophobic core, and it targets the edge of the beta-barrel.

4. In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.

The obtained ipTM values across the four protein-peptide complexes range from 0.52 to 0.64, showing that all PepMLM-generated peptides successfully meet or exceed the 0.5 threshold required for a confident protein-peptide interaction. Sequence 3 (ATVGVVQVIICN) is the standout performer, achieving an ipTM of 0.64, which signifies a highly stable and specific docking configuration compared to the more moderate scores of Sequence 1 (0.55) and Sequence 4 (0.56). This high-scoring peptide matches and effectively exceeds the normal performance of generic binders in this system, likely due to its hydrophobic VVQVII motif which complements the destabilised environment of the A4V mutant SOD1. While Sequence 2 recorded the lowest confidence at 0.52, it still represents a true predicted binder, though it likely lacks the deep structural burial or specific N-terminal targeting seen in the 0.64 model.

Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse

Structural confidence alone is insufficient for therapeutic development. Using PeptiVerse, let’s evaluate the therapeutic properties of your peptide! For each PepMLM-generated peptide: Paste the peptide sequence. Paste the A4V mutant SOD1 sequence in the target field.

  1. Peptide sequence: SSTLRLFAQLRR Screenshot 2026-03-08 at 15.58.33.png Screenshot 2026-03-08 at 15.58.33.png

  2. GTGTASGALLLK Screenshot 2026-03-08 at 16.01.23.png Screenshot 2026-03-08 at 16.01.23.png

  3. ATVGVVQVIICN Screenshot 2026-03-08 at 16.03.44.png Screenshot 2026-03-08 at 16.03.44.png

  4. WRGDDSVDFEGR Screenshot 2026-03-08 at 16.06.33.png Screenshot 2026-03-08 at 16.06.33.png

3.Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties? Choose one peptide you would advance and justify your decision briefly.

The peptides generally show a positive correlation between structural confidence and predicted affinity, though the relationship is not strongly linear. Peptide 3 (ATVGVVQVIICN), with the highest structural confidence (ipTM = 0.64), had the highest predicted binding affinity (6.204 pKd/pKi). Comparatively, Peptide 2 (GTGTASGALLLK), had the lowest structural confidence (ipTM = 0.52) and the weakest predicted affinity (5.555 pKd/pKi). All four peptides are predicted to be soluble (1.000 probability) and non-hemolytic, with hemolysis probabilities being well below 10%. While Peptide 3 offers the highest binding affinity, its increased hydrophobicity (1.83 GRAVY) and near-neutral charge (-0.21) may make it less stable in aqueous environments compared to Peptide 1 (SSTLRLFAQLRR), which has a high structural fit (ipTM = 0.55) with great solubility and a favourable cationic charge (+2.46), increasing its stability in aqueous environments.

I would advance Peptide 3 (ATVGVVQVIICN) for further development; as it has a high binding affinity and structural confidence.

Part 4: Generate Optimized Peptides with moPPIt I tried to run this section but for the final step, generating, samples.csv it throws an error, below:

/content/moPPIt/flow_matching/path/path_sample.py:45: SyntaxWarning: invalid escape sequence '\s'
  x_t (Tensor): the sample along the path  :math:`X_t \sim p_t`.
/content/moPPIt/flow_matching/path/scheduler/scheduler.py:72: SyntaxWarning: invalid escape sequence '\s'
  SchedulerOutput: :math:`\alpha_t,\sigma_t,\frac{\partial}{\partial t}\alpha_t,\frac{\partial}{\partial t}\sigma_t`
/content/moPPIt/flow_matching/path/scheduler/scheduler.py:79: SyntaxWarning: invalid escape sequence '\k'
  Computes :math:`t` from :math:`\kappa_t`.
Traceback (most recent call last):
  File "/content/moPPIt/moppit.py", line 25, in <module>
    target_sequence = tokenizer(target, return_tensors='pt').to(device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 800, in to
    self.data = {k: v.to(device=device) for k, v in self.data.items()}
                    ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 417, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_164/1413049073.py in <cell line: 0>()
     38 ret = proc.wait()
     39 if ret != 0:
---> 40    raise RuntimeError(f"moo.py failed with code {ret}")

RuntimeError: moo.py failed with code 1

Part C: Final Project: L-Protein Mutants

Q3, and 4) The language model tries to calculate whether a mutation is common in nature whereas the experimental data measures lysis. With the particular protein sequence, the language model embeddings would not apprehend the information sufficiently to be used as a predictive tool solely without fine-tuning or specialised training on viral functional assays. Evolutionary Index (EI) is usually measured by ESM-2, but it cannot understand/reflect gain of function or functional nuances which will cause bacterial cell lysis. Protein levels can capture protein structural integrity and conformational stability, and the score was 0.088, indicating that the language model’s embeddings do not capture the particular protein structural limitations, conformational restrictions or physical boundaries which dictate requirements: conformationally or structurally of the L-Protein. After doing statistical analyses, I received a correlation coefficient of 0.0068, which is an extremely weak/non-existent linear relationship between the ESM-2 score and the lysis outcome. Furthermore, the mean score for mutations which cause cell breakdown (-0.371) was very close to those that did not (-0.389). Also, a t-test produced a p-value of 0.958 (non statistically significant between SM-2 scores and the experimental L-Protein lysis data.

MSL protein structural breakdown:

  • Soluble region (N-terminal) typically spans residues 1–40, while the transmembrane (TM) region spans roughly residues 41–70.

Sites like the Leucine (L) at position 10 or the Cysteines (C) are often highly conserved across homologs.

Q5)

Screenshot 2026-03-09 at 00.14.01.png Screenshot 2026-03-09 at 00.14.01.png

Q6) [Multimer Assembly Assignment]

Week 6 HW: Genetic Circuits Part I: Assembly Technologies

Q1 What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?

  • Phusion DNA Polymerase which synthesises new DNA strands by adding nucleotides using the template during replication. It ensures high precision copying, editing and proofreading.
  • Deoxynucleotides - dNTP consists of a deoxyribose sugar, a nitrogenous base (A, T, C, or G), and three phosphate groups. The DNA polymerase adds them to a growing DNA chain to form the new complementary strand.
  • Reaction Buffer - creates a suitable chemical environment for the Phusion High-Fidelity DNA Polymerase for DNA synthesis. Regulates a stable pH, provides magnesium ions as a cofactor for catalytic activity, and contains important components to increase specificity, yield and high-fidelity performance.
  • Magnesium Chloride - DNA polymerase which synthesises DNA, requires Mg2+ for DNA synthesis, as it is required for enzyme Catalytic Activity, in the active site to catalyze the formation of phosphodiester bonds between the 3′-OH of a primer and the phosphate group of a nucleotide.
  • DMSO to increase G/C rich targets - additive which boosts denaturation of difficult or GC-rich targets.

Q2 What are some factors that determine primer annealing temperature during PCR?

  • G/C content of the primer sequence - higher GC content will have a higher melting temperature as G/C base pairs have three hydrogen bonds compared to two in A/T base pairings.
  • Longer primer sequences require a higher melting temperature, which need a higher primer annealing temperature.
  • Cations (like Na+) and magnesium ion concentration can cause the melting temperature to increase.
  • Mutations in the target sequence, and base pair mismatches between the primer and the target can reduce the stability and the annealing temperature
  • Denaturants which reduce stability of primer-template duplexes and reduce primer melting temperature

Q3 There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.

PCR should be picked if:

  • There is less DNA starting material
  • No restriction enzyme cutting sites which are suitable for cutting near gene of interest
  • Want to introduce a selective mutation
  • Want to add sequences at the end (like in Gibson’s Assembly)

Restriction enzymes should be picked if:

  • Cutting a circular DNA plasmid into a linear molecule to allow for the insertion of a DNA insert, for cloning and ensuring 100% backbone sequence correctness
  • Checking if the plasmid has the right DNA insert
  • DNA is in high concentration and we have the correct sites for RE cutting

Q4 How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?

  • High GC content in the overlaps to promote stability
  • Run PCR on gel to ensure correct sizes and no contaminants - ensure purification of the DNA from the gel to remove template DNA and enzymes
  • The fragments need to share 20–40 bp of homologous sequence at the ends of the insert(s) and linearised vector. Primers need to be designed with tails that match the adjacent fragment or vector.
  • Full digested vector needs to be linearised
  • Use High-Fidelity Polymerase

Q5 How does the plasmid DNA enter the E. coli cells during transformation?

For DNA entry to occur, cells must first be made competent.

This is achieved through two main methods:

  • Chemical transformation (heat shock): Cells are treated with calcium chloride to allow DNA binding to the cell surface. Quickly shifting the cells from an ice bath to a 42°C water bath creates a pressure difference that allows the DNA to enter the cell.
  • Electroporation: A short, high-voltage electric pulse is applied to the cell suspension, which physically creates temporary pores in the cell membrane through which the plasmid DNA can pass.

Q6 Describe another assembly method in detail (such as Golden Gate Assembly) Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online). Model this assembly method with Benchling or Asimov Kernel!

In Golden Gate cloning, circular DNA in plasmids or linear PCR products is mixed in a tube with the Type IIS restriction enzyme (which cuts outside of the recognition sites, producing specific 4-base pair overhangs which aren’t part of the final, seamless construct, as the sites are removed, the final construct is stable and will not be cut again, which allows digestion and ligation of DNA for creating genetic circuits without the purification step required), and DNA ligase enzyme to repair and join the DNA, and the tube is put in the PCR machine. This is then transformed into E coli and selected and propagated on the agar plate (Bird et al., 2022).

Diagram showcasing Golden Gate assembly.

       [Plasmid/PCR DNA 1]   [Plasmid/PCR DNA 2]   [Plasmid/PCR DNA 3]
        /           \     /           \     /           \
   --[BsaI]-- ---[BsaI]---[BsaI]--- ---[BsaI]--- ---[BsaI]--- 
      [Site1]     [Site2] [Site2]     [Site3] [Site3]     [Site4]

         |           |       |           |       |           |
         V           V       V           V       V           V
   [Overhang 1]----[Overhang 2]----[Overhang 3]----[Overhang 4]
   (Four-letter overhangs made - a sticky edge generated by digestion)

                       + T4 DNA Ligase (for joining DNA)
                       + Destination Vector (For transformation into *E Coli*)
                       + Type IIS Enzyme 
                       + 37°C/16°C Cycling (if the pieces are joined together incorrectly (with the Restriction enzyme recognition sites still attached), the scissors can cut them apart again. If they stick correctly and the RE recognition sites are removed, the scissors can't recognise them anymore.)
                               |
                               V
           [Final Assembled Product]
      [Vector]-<1>---[Fragment 1]---<2>---[Fragment 2]---<3>---[Fragment 3]---<4>
      (Stable, restriction sites removed, precise ordering)
      
(Bird et al., 2022)

Key 

BsaI - RESTRICTION ENZYMES

Overhang - single-stranded DNA that stick out from the end of a double-stranded molecule.

Q6 (Part 2) To model Golden Gate Assembly in Benchling or Asimov Kernel, I start by digitizing my DNA fragments and cleaning them to remove any internal BsaI sites that might cause unwanted cutting. I then use the software’s Type IIS assembly tool to define my 4-base-pair overhangs, ensuring each piece slots into the next like a unique puzzle piece. I run a dry lab simulation to strip away the recognition sites and visualize the final, scarless plasmid map. This digital step is how I catch sequence errors and ensure my reading frames stay intact before I ever pick up a pipette.

This is the modelling in Benchling: https://benchling.com/s/seq-q2XW9o9mcjLBiH2V3lSA?m=slm-ex0hzqOg6DbZToMfb5Hi

The insert is GFP and the vector plasmid is pGDR.

Screenshot 2026-03-26 at 09.23.22.png Screenshot 2026-03-26 at 09.23.22.png

References

Bird, J. E., Marles-Wright, J., & Giachino, A. (2022). A user’s guide to golden gate cloning methods and standards. ACS Synthetic Biology, 11(11), 3551–3563. https://doi.org/10.1021/acssynbio.2c00355

New England Biolabs. (n.d.). Phusion High-Fidelity PCR Master Mix with HF Buffer. https://www.neb.com/en-gb/products/m0531-phusion-high-fidelity-pcr-master-mix-with-hf-buffer

Week 7 HW: Genetic Circuits Part II: Neuromorphic Circuits

Q1) What advantages do IANNs have over traditional genetic circuits, whose input/output behaviors are Boolean functions?

  • IANNs can be rearranged, remodelled, or changed into a new form, layout, or function after their initial creation in comparison to boolean genetic circuits which are rigid/fixed, and its reconfiguring ability means it can be used for analytical devices integrating biological recognition elements (enzymes, microorganisms, antibodies) with transducers to detect bioavailable pollutants (microplastics, heavy metals) to check biotoxicity
  • Less metabolic stress in cells with IANNs compared to boolean genetic circuits
  • IANNs can do Complex pattern recognition and analog computation - IANNs can look at many variables like temperature, Wind Direction, Traffic Speed and Humidity and the logic derived from this is relationships and biological signatures for example, it is context aware and gives a probability based result (risk in percentage or category).

This is the opposite for traditional genetic circuits, the same processing step is a conditional if/then statement, as Boolean circuits are limited to 2^n states. For example, if a city sensor receives an input that is “partly true” (e.g., moderate congestion but high humidity), a Boolean circuit may fail to trigger / provide an incorrect binary output.

  • In a Boolean circuit, every input typically has equal weighting to flip a gate comparatively to IANNs which allow for synaptic weighting. You can tune the promoter strengths / molecular affinities so that a “wind speed” input has a 70% influence on the output, while “humidity” only has 30%. This is critical for the “smart city” use case where variables have different levels of importance.
  • IANNs can cluster many signals before reaching an activation threshold, so IANNs are natural filters and can filter out noise which trigger false positives in traditional genetic circuits. They require a specific consensus of molecular inputs to fire.

Q2) Describe a useful application for an IANN; include a detailed description of input/output behavior, as well as any limitations an IANN might face to achieve your goal.

Chemical biosensor

  • Variable 1 - concentration of nitrogen dioxide can be a primary input, binding to specific receptors.
  • Variable 2 (VOCs): Volatile Organic Compounds (like benzene from car exhaust) act as a secondary input.
  • Variable 3 (Temperature): Temperature-sensitive proteins act as a “contextual weight.”
  • The IANN “recognizes” a pattern. For example: [High nitrogen dioxide] + [High VOCs] + [High Temperature] indicates a stagnant smog event. Because the signals are analog, the network can distinguish between a “passing truck” (short spike) and a sustained pattern.
  • Output: Proportional response: The output is not a binary “On/Off” but a graduated biological response proportional to the severity of the recognized pattern.
  • Remediation (The action): The organism secretes a specific enzyme (like nitroreductase) to break down nitrogen dioxide into non harmful byproducts. In the IANN, if the pollution pattern is “moderate,” the plant produces a moderate amount of enzyme. If it is critical, it triggers a maximum secretion.
  • The signal output - The system produces a Green Fluorescent Protein (GFP). The intensity of the “glow” represents the recognized pollution level, which can be mapped by city drones or satellite imaging to update a City Digital Twin in real-time.

Limitations

  • Metabolic stress, running a neural network inside a cell is energy intensive, so the organism secreting the enzyme for breaking down chemicals may grow slower or die untimely compared to natural counterparts.
  • IANNs tuned in laboratory settings under controlled conditions in comparison to a Smart City, a Bio-Sentinel might face natural disasters or physical damage. These outside stresses may modify the weights of the neural network unexpectedly, causing the system to lose its calibration.

Q3) Draw a diagram for an intracellular multilayer perceptron where layer 1 outputs an endoribonuclease that regulates a fluorescent protein output in layer 2.

INPUT X1                          INPUT X2
    (DNA: pCsy4)                      (DNA: pOutput)
         |                                 |
         | [Tx]                            | [Tx]
         v                                 v
      mRNA_Csy4                        mRNA_Output
         |                          (contains Csy4 site)
         | [Tl]                            |
         v                                 |
     [ Csy4 ]----------------------------> X (Cleavage/Inhibition)
    (Protease)                             |
                                           |
                                           v
                                     [ NO PROTEIN ]
                                           or
                                    [ FP OUTPUT ] 
                                   (if X1 is absent)

How does the above diagram work?

In a neural network context, this biological setup functions as an NOT gate or a weighted input where X1 negatively regulates X2.

Transcription (Tx) & Translation (Tl): These represent the biological computation steps.

How does it work: The mRNA for the fluorescent protein (X2) contains a specific hairpin sequence. When the Csy4 protein (X1) is present, it recognizes and cleaves that hairpin.

Output: Cleavage usually destabilizes the mRNA or separates the ribosome-binding site from the coding sequence, effectively turning off the output.

Logic Table

Input X1 (Csy4)Input X2 (FP DNA)Output (Fluorescence)
0 (Absent)1 (Present)High
1 (Present)1 (Present)Low-nothing

Assignment Part 2: Fungal Materials

Q1) What are some examples of existing fungal materials and what are they used for? What are their advantages and disadvantages over traditional counterparts?

  • Mycelium for leather jackets, bags, coats - advantages are that they are renewable, biodegradable and sustainable. Disadvantages are that they are expensive, and production process is not scalable, not as durable (reported by some users) and easily repairable as manual leather, mycelium cultivation if not done with renewable energy can be environmentally costly.
  • Mycelium and enzymes like laccases and peroxidases can degrade, sequester, or detoxify environmental toxins like heavy metals, plastics, oil and dyes (Dinakarkumar et al., 2024).

Q2) What might you want to genetically engineer fungi to do and why? What are the advantages of doing synthetic biology in fungi as opposed to bacteria?

I’d like to engineer fungi for medicinal purposes: fungi are naturally produce secondary metabolites (like penicillin). Engineering these allows us to produce high-value drugs like psilocybin (for depression) or cannabinoids (for pain) more sustainably than traditional farming (Keller, 2019). Fungi are eukaryotic comparatively to bacteria, and can do complex protein folding for stability as well as do post translational modifications (glycolysation, methylation), can secrete extracellular fluid outside cells quickly, can metabolise complex chemicals comparatively to bacteria which can break down simple chemicals and can form complex networks of filaments comparative to bacteria which are unicellular.

Assignment Part 3: First DNA Twist Order

Insert: T7 Phage gp17 Receptor Binding Domain (RBD).

Coordinates: UniProt P03748, Residues 371–553. C-terminal receptor-binding region (residues 371–553), including the distal tip domain (466–553) containing specificity-determining loops.

Logic: This domain is responsible for host recognition. In my project, this is the “Variable” component that I am optimizing with ProteinMPNN to target P. aeruginosa.

1 AA Sequence for my Insert I will use residues 371-553 from UniProt entry , https://www.uniprot.org/uniprotkb/P03748/entry#sequences. This is the portion of the protein which actually chooses which bacteria to kill.

GHVLQLESASDKAHYILSKDGNRNNWYIGRGSDNNNDCTFHSYVHGTTLTLKQDYAVVNKHFHVGQAVVATDGNIQGTKWGGKWLDAYLRDSFVAKSKAWTQVWSGSAGGGVSVTVSQDLRFRNIWIKCANNSWNFFRTGPDGIYFIASDGGWLRFQIHSNGLGFKNIADSRSVPNAIMVENE

2 Reverse translation from protein to DNA sequence

Reverse Translate results
Results for 183 residue sequence "Untitled" starting "GHVLQLESAS"

>reverse translation of Untitled to a 549 base sequence of most likely codons.
ggccatgtgctgcagctggaaagcgcgagcgataaagcgcattatattctgagcaaagat
ggcaaccgcaacaactggtatattggccgcggcagcgataacaacaacgattgcaccttt
catagctatgtgcatggcaccaccctgaccctgaaacaggattatgcggtggtgaacaaa
cattttcatgtgggccaggcggtggtggcgaccgatggcaacattcagggcaccaaatgg
ggcggcaaatggctggatgcgtatctgcgcgatagctttgtggcgaaaagcaaagcgtgg
acccaggtgtggagcggcagcgcgggcggcggcgtgagcgtgaccgtgagccaggatctg
cgctttcgcaacatttggattaaatgcgcgaacaacagctggaacttttttcgcaccggc
ccggatggcatttattttattgcgagcgatggcggctggctgcgctttcagattcatagc
aacggcctgggctttaaaaacattgcggatagccgcagcgtgccgaacgcgattatggtg
gaaaacgaa

3 Codon optimised sequence

GGCCACGTGCTGCAGCTGGAAAGCGCGAGCGATAAAGCGCATTATATTCTGAGCAAAGATGGCAATCGTAATAACTGGTACATCGGCCGCGGCAGCGATAATAACAATGATTGCACCTTCCATAGCTACGTGCACGGCACCACCCTGACCCTGAAACAGGATTATGCGGTGGTGAACAAACATTTCCACGTGGGCCAGGCAGTGGTCGCGACCGATGGCAACATTCAGGGCACCAAATGGGGCGGCAAATGGCTGGATGCGTATCTGCGCGATAGCTTTGTGGCGAAAAGCAAAGCCTGGACCCAGGTGTGGAGCGGCAGCGCCGGCGGCGGCGTGTCCGTGACCGTGAGCCAGGATCTGCGCTTTCGCAATATCTGGATTAAATGCGCGAATAATAGCTGGAACTTCTTCCGCACCGGCCCGGATGGCATTTACTTTATTGCCAGCGATGGCGGTTGGCTGCGCTTTCAGATTCACTCGAACGGCCTGGGCTTCAAAAACATTGCCGATAGCCGCAGCGTGCCGAACGCGATTATGGTGGAAAACGAA

4

For phage protein expression, I will use the pET-28a(+) vector. I chose this vector because:

  • It has a robust T7 promoter for high-yield protein production.

  • It includes a Kanamycin resistance marker (standard for selection).

  • It features an N-terminal His-tag, which is essential for the proteomic validation.

Sequence insert - annotated link: https://benchling.com/s/seq-LYTiB6CS3kC0MuzSbY4s?m=slm-w9Gc9uoWLYMnC4Q5CcGe

Plasmid construct: https://benchling.com/s/seq-n18sRhXq6Yz6uol2epHg?m=slm-W6czjeckW1A2aaynxjBC

Genetic circuits prelab:

BUNLO PREACT CONEDATE ANYTE.jpeg BUNLO PREACT CONEDATE ANYTE.jpegNew Note.jpeg New Note.jpegPredict Circuit with Biocompiler-Predict.jpeg Predict Circuit with Biocompiler-Predict.jpeg

References

Dinakarkumar, Y., Ramakrishnan, G., Gujjula, K. R., Vasu, V., Balamurugan, P., & Murali, G. (2024). Fungal bioremediation: An overview of the mechanisms, applications and future perspectives. Environmental Chemistry and Ecotoxicology, 6, 293–302. https://doi.org/10.1016/j.enceco.2024.07.002

Keller, N. P. (2019). Fungal secondary metabolism: Regulation, function and drug discovery. Nature Reviews Microbiology, 17(3), 167–180. https://doi.org/10.1038/s41579-018-0121-1

Week 9 HW: Cell-free Systems

Homework Part A: General and Lecturer-Specific Questions General homework questions

[1] Explain the main advantages of cell-free protein synthesis over traditional in vivo methods, specifically in terms of flexibility and control over experimental variables. Name at least two cases where cell free expression is more beneficial than cell production.

  • If we wanted to produce a toxic protein / protein deposits which are misfolded, non-functional, and often highly stable multi-molecular structures - restriction enzymes which cut and edit DNA, cytotoxic proteins (e.g. immune cell related molecules such as perforins which puncture cell membranes and granzymes which trigger cell apoptosis, both produced by NK cells), the cell free expression allows production without affecting cell viability in comparison to in vivo methods.
  • Due to the barrier of cell membranes being removed in cell-free systems, labeled or unnatural amino acids can be put in the mix (of organelles like ribosomes) for targeted, specialised protein synthesis: for example in in vivo labeling for NMR or fluorescence studies.
  • In in vivo methods, the cell system’s energy and resources can be divided between different organelles, whereas in CFS the energy and resources can solely be focused on making the target protein.
  • You can do real time tracking of protein synthesis using spectrophotometry or other analytical methods in CFS due to the large, uncomplicated system comparatively to in vivo methods where the cell has many organelles.
  • Conditions of the reaction can be controlled; variables such as temperature, redox potential and pH can be changed and finetuned without worrying about causing host death.
  • Cloning and transformation steps are skipped, as we can use linear DNA or circular plasmid DNA directly.

Two cases where CFS is beneficial over in vivo methods;

  1. We will save time and resources by skipping cloning and transformation expected in in vivo systems as we have the ability to synthesise from directly from linear PCR products. This could be for example if we wanted to screen many thousands of protein mutants for functional genomics / discovery for drugs reasons.
  2. Addition of non-natural amino acids without worrying about travelling across the cell membrane present in in vivo systems. We can use this for area specific labelling with fluorescent dyes or stable isotopes for NMR and X-ray crystallography.

[2] Describe the main components of a cell-free expression system and explain the role of each component.

Components include;

  1. Cell machinery - ribosomes for mRNA to protein translation, enzymes (Aminoacyl-tRNA Synthetases, RNA polymerase), translation factors
  2. Circular plasmids / linear DNA - genetic material for replication
  3. Nucleoside triphosphates - energy generation for cell processes
  4. Small molecules and buffer - amino acids, building blocks of proteins, Mg2+ and K ions for stability of ribosome as cofactors and enzyme active site driven catalytic activity.
  5. Buffers to maintain stable pH
  6. Agents that prevent oxidation during protein synthesis

[3] Why is energy provision regeneration critical in cell-free systems? Describe a method you could use to ensure continuous ATP supply in your cell-free experiment.

Creatine Phosphate (CP) / Creatine Kinase (CK enzyme) system can ensure a continuous ATP supply in a CFS, producing creatinine which is a byproduct which does not interfere with cell translation machinery, as usually in the CFS, ATP hydrolysis produces increasing levels of inorganic phosphate which inhibit protein synthesis by sequestering essential magnesium ions (Whitaker, 2013).

[4] Compare prokaryotic versus eukaryotic cell-free expression systems. Choose a protein to produce in each system and explain why.

FeatureProkaryoticEukaryotic
SpeedVery fastSlow
CostRapid reproduction rate, minimal nutritional needs, and easy to genetically manipulateHigh
Post-translational modificationsSmall-to-noneCapable of folding and basic PTMs
Tolerance to toxic proteins and sequencesHigh for toxic proteinsSensitive to particular viral/toxic sequences
Transcriptional templateUtilises circular/linear DNAUsually requires capped/polyadenylated mRNA.

GFP would be suitable to produce in a prokaryotic system as it does not require PTMs and is a non glycolysated protein. It is a non complex protein and does not require complicated conformational folding into a 3D functional structure. As E. coli systems can do protein synthesis quickly in terms of speed and volume, they are suitable for high-throughput screening of fluorescent reporter proteins like GFP, in order to verify system activity / test varying promoter strengths without the energy and resources used in eukaryotic processing.

Human Tissue Plasminogen Activator (tPA) would be suitable to produce in a eukaryotic system as it requires PTM (multiple disulfide bonds and specific glycosylation patterns to become a functional protein). Since it is a complex protein, if produced in the prokaryotic system it would make an inactive, misfolded inclusion body, which can be toxic and nonfunctional, of tPA. A eukaryotic system (Chinese hamster ovary / insect cell-free) provides the protein chaperones and microsomal membranes needed for correct protein folding and will receive the sugar chains required for its function, thrombolysis in blood clotting.

[5] How would you design a cell-free experiment to optimize the expression of a membrane protein? Discuss the challenges and how you would address them in your setup.

  1. Select expression platform - eukaryotic (PTMs, microsomes can assist in protein folding for complex proteins) or prokaryotic (no PTMs, but fast and large amount of proteins produced)
  2. Create a synthetic phospholipid bilayer - use liposomes/vesicles directly to the reaction to provide a landing site for the hydrophobic transmembrane regions and use membrane scaffold proteins (MSPs) to produce nanodiscs which catch the protein in its native state and use detergents for protein solubility if lipids not used
  3. Sequence and Template Optimization for ribosome mRNA reading and translation - smoothes out mRNA knots at the N-terminus and N-terminal tags as ribosome handles.

Challenges and solutions

Protein misfolding which produces a nonfunctional 3D structure –> produce the protein whilst there are nanodiscs / liposomes to provide immediate hydrophobic shielding. Low yield –> Implement a continuous exchange (CECF) system. We’ll use a slightly permeable lipid membrane to give them fresh nutrients and remove inhibitory byproducts, increase the reaction time. Incomplete folding –> add molecular chaperones (like DnaK/DnaJ) or utilise eukaryotic cell lysates which contain protein folding chaperones in microsomes. Batch variability –> Utilise AI-driven active learning to standardise reaction protocols across various lysate batches.

[6] Imagine you observe a low yield of your target protein in a cell-free system. Describe three possible reasons for this and suggest a troubleshooting strategy for each.

  • Incomplete template or codon bias: Incorrect DNA sequences or rare codons can stall translation, we can troubleshoot by verifying the template integrity via sequencing or using codon-optimised genes.

  • Resource depletion or inhibitors: High levels of metabolic byproducts or nucleases can degrade components; troubleshoot by using a continuous-exchange system or adding RNase inhibitors and fresh energy substrates.

  • Protein folding or solubility issues: Rapid synthesis can lead to misfolded proteins / inclusion bodies; we can troubleshoot by adding molecular chaperones or reducing the reaction temperature to slow down the translation rate.

Homework question from Kate Adamala Based on: Tang et al., 2017

Pick a function and describe it.

What it does: The synthetic cell acts as a translator which converts an enzyme-based signal into a genetic output. Input: Glucose and an enzyme (Glucose oxidase which oxidises glucose into gluconic acid and hydrogen peroxide, and is often used in blood glucose level tracking, extract oxygen from bottled drinks and food packaging (such as mayonnaise, wine) to increase its shelf life by preventing oxidation and bacterial proliferation. Output: Hydrogen peroxide which triggers a genetic response (protein expression) in a neighboring population, GOx-produced Hydrogen peroxide can alter genetic expression by inducing oxidative stress-mediated pathways, activating stress-response mechanisms, and triggering cell-specific defence pathways like for e.g. p53, and repressing anti-aging genes (like Klotho).

Could this function be realized by cell-free Tx/Tl alone?

No, without encapsulation, the chemical gradients needed to trigger specific downstream signalling would dissipate. The protocell environment provides a high local concentration of DNA and transcription machinery, which would be too diluted in an open solution.

Could this function be realized by genetically modified natural cell?

Methodically yes, but natural cells have a composite metabolism that might interfere with the specific chemical intermediates (Hydrogen Peroxide). A synthetic cell allows for a noise-free channel: removing background noise from the cell’s own redox homeostasis.

What would the membrane be made of?

This paper uses protein-polymer microcapsules (proteinosomes), for my assignment, I’d want to adapt it to POPC/POPG phospholipids to create vesicles that mirror natural anionic bacterial membranes more closely than single-lipid systems.

What would you encapsulate inside?

  • Bacterial Tx/Tl machinery (like PURE system).

  • Plasmids encoding the target protein.

  • Small molecule substrates (e.g., specific nucleotides and amino acids).

Which organism will your Tx/Tl system come from?

  • E. coli species - bacterial strain. This is because the promoters used in these chemical communication studies (like the oxyR system for Hydrogen peroxide sensing) are natively bacterial.

How will your synthetic cell communicate? The membrane is designed to be semi-permeable to small molecules (glucose, Hydrogen peroxide) but impermeable to the large DNA and enzymes inside. I may use Alpha-hemolysin (aHL) pores to ensure the exit of larger output molecules.

Experimental details (Genes/Lipids)

Lipids: POPC (1-palmitoyl-2-oleoyl-glycero-3-phosphocholine).

Genes: The oxyS promoter (which is induced by oxidative stress) controlling the expression of mCherry or GFP.

Pore Gene: hla (encodes Alpha-hemolysin).

How will you measure function? I’ll use Fluorescence Microscopy to see single synthetic cells glow after receiving the chemical signal.

I’ll use a Plate Reader for measuring the kinetics of the protein expression over time across the whole population.

Homework question from Peter Nguyen

Freeze-dried cell-free systems can be incorporated into all kinds of materials as biological sensors or as inducible enzymes to modify the material itself or the surrounding environment. Choose one application field — Architecture, Textiles/Fashion, or Robotics — and propose an application using cell-free systems that are functionally integrated into the material. Answer each of these key questions for your proposal pitch: Write a one-sentence summary pitch sentence describing your concept.

My proposal is using FDCF systems embedded within a 3D-printed, biocompatible soft robotic lattice. The system will be a bridge between a robotic prosthetic and human tissue, capable of sensing physiological states (such as neuroendocrine activation, high energy use, insulin resistance, inflammatory and immune system response, disregulated electrolyte and fluid levels, increased blood clotting and gastrointestinal issues such as vomiting and nausea) and monitoring post-surgical tissue health in real-time and responding autonomously by secreting therapeutic enzymes to prevent infection and increase tissue healing.

How will the idea work, in more detail? Write 3-4 sentences or more. I will use FDCF genetic circuits, inside hydrogel microbeads within a 3D printed matrix and has a long shelf life until it is hydrated by ISF or sweat. It can recognise pathogen associated molecular patterns (specific antigens or epitopes that specific pathogen strains have), or inflammatory biomarkers (drop in pH / detecting specific bacterial RNA). As the Cell free system activates, it will trigger the release of antimicrobial peptides to reduce inflammation and signalling molecules which promote proliferation, differentiation, and tissue repair to increase tissue healing. The material structural properties change (e.g. swelling in a specific area of the body) to provide a mechanical signal to the user’s robotic prosthetic.

What societal challenge or market need will this address?

Sensors require batteries and wiring to produce an output; the biosthetic graft is power-independent and biologically powered. It also has the benefit of localised synthesis of antimicrobial peptides for instance for reducing bacteria related inflammatory biomarkers.

Vascular disease and diabetes patients (10%) may rely on prosthetics due to developing open sores or wounds, on the bottom of the foot, leading to complex conditions such as gangrene which can lead to limb amputation. Patients with neuropathy cannot feel when a prosthetic causes a pressure related sore or if a wound turns into a bacterial infection. Due to this being noticed late, the patient’s symptoms can be irreversibly damaging. One size of prosthetic does not fit everyone, as limbs change size and shape during the day because of swelling or temperature. A static prosthetic cannot adapt, leading to friction between the skin and prosthetic and skin breakdown. Therefore the FDCF solution I proposed finds the infection before symptoms occur, and is a lattice featuring macro-pores - large allow the skin to breathe and preventing skin maceration, micropores which wick the interstitial fluid and deliver to the embedded cell-free sensors and nanopores which allows sugars and RNA to go to the gene circuits, for gradient stiffness: a 3D printed graft which is viscoelastic. It will have regenerative properties too, if a small tear appears in the material the cell-free material can produce collagen resembling peptides to be self healing and repair the microtear. We can use fluorescent repair proteins like GFP to report levels of bacterial infection when they are high and the graft can change colour to represent this to the patient, alongside a light output warning them.

How do you envision addressing the limitation of cell-free reactions (e.g., activation with water, stability, one-time use)?

Regarding one time use and constant topping of water for activation of the FDCF via microfluidic gating, where the robot’s CPU accurately hydrates specific FDCF pixels when required, and using trehalose and silica (pairing of a protective sugar and a mineral material) to lock components in a long shelf life, glassy state. This would allow multiple uses of the same graft.

Homework question from Ally Huang

Freeze-dried cell-free reactions have great potential in space, where resources are constrained. As described in my talk, the Genes in Space competition challenges students to consider how biotechnology, including cell-free reactions, can be used to solve biological problems encountered in space. While the competition is limited to only high school students, your assignment will be to develop your own mock Genes in Space proposal to practice thinking about biotech applications in space!

For this particular assignment, your proposal is required to incorporate the BioBits® cell-free protein expression system, but you may also use the other tools in the Genes in Space toolkit (the miniPCR® thermal cycler and the P51 Molecular Fluorescence Viewer). For more inspiration, check out https://www.genesinspace.org/ .

Provide background information that describes the space biology question or challenge you propose to address. Explain why this topic is significant for humanity, relevant for space exploration, and scientifically interesting. (Maximum 100 words)

The question I want to address: Can you engineer yeast or bacteria to produce medicine on demand in space?

Many Earth-made medicines can expire or degrade quickly due to space radiation. I want to Program genetic circuits that trigger the production of vitamins or antibiotics only when needed. This is relevant, as it will reduce the massive cargo weight of medical supplies for deep-space transit.

Name the molecular or genetic target that you propose to study. Examples of molecular targets include individual genes and proteins, DNA and RNA sequences, or broader -omics approaches. (Maximum 30 words) My target will be PCUPI and PGAL1, inducible promoters which will initiate drug synthesis in the presence of a non-lethal copper source and galactose being added respectively.

Describe how your molecular or genetic target relates to the space biology question or challenge your proposal addresses. (Maximum 100 words)

My target will be the inducible promoters, Copper-inducible promoter (PCUPI) and galactose-inducible promoter (PGAL1). The promoters are tunable and inducible meaning certain conditions such as the presence of ions are needed for an output (in this case drug synthesis). PCUPI allows astronauts to start drug synthesis primarily when needed by adding a non-lethal copper source, reducing the metabolic burden on the yeast in contrast to constitutive promoters. Yeast deletion collections have shown that yeast can withstand space radiation and microgravity, making it a reliable chassis for genetic engineering in space. This has been proven in space, it been successfully used to express human proteins (like gelatin) in related yeasts and is a reliable, high-yield system. Studies have shown that is active in Pichia pastoris with copper induction within 2 hours, making it highly suitable for rapid-response drug production.

Clearly state your hypothesis or research goal and explain the reasoning behind it. (Maximum 150 words)

The goal is to develop an on-demand bioproduction system in Saccharomyces cerevisiae using a dual-input genetic switch to ensure metabolic efficiency in deep space. I hypothesise that a hybrid promoter system will allow for tighter, tunable control of medicinal protein synthesis. In this model, galactose acts as the primary “on” switch, while copper ions provide a secondary rheostat to modulate expression levels. This prevents the “leaky” expression often seen in single-inducer systems, which can lead to metabolic burden or plasmid instability during long-term storage in microgravity. By decoupling growth from production, we can maintain healthy “starter” cultures and trigger high-yield medicine synthesis only when specifically required.

Outline your experimental plan - identify the sample(s) you will test in your experiment, including any necessary controls, the type of data or measurements that will be collected, etc. (Maximum 100 words)

This experiment uses the BioBits® system to validate a hybrid promoter for on-demand protein synthesis. We will test three DNA templates: a wild-type (negative control), a constitutive promoter (positive control), and an experimental hybrid promoter driving GFP. To initiate synthesis, we will add varying concentrations of Galactose (0–2%) and Copper Sulphate (0–500 µM) to the BioBits pellets. We will use the miniPCR® for precise incubation at 37°C. Data collection includes real-time fluorescence quantification using the P51 Viewer to determine the optimal inducer ratio and maximum protein yield in microgravity.

References

Tang, T.-Y. D., Cecchi, D., Fracasso, G., Accardi, D., Coutable-Pennarun, A., Mansy, S. S., Perriman, A. W., Anderson, J. L. R., & Mann, S. (2018). Gene-mediated chemical communication in synthetic protocell communities. ACS Synthetic Biology, 7(2), 339–346. https://doi.org/10.1021/acssynbio.7b00306

Whittaker, J. W. (2013). Cell-free protein synthesis: The state of the art. Biotechnology Letters, 35(2), 143–152. https://doi.org/10.1007/s10529-012-1075-4

Week 10 HW: Advanced Imaging & Measurement Technology

Homework: Final Project

For your final project:

  • Please identify at least one (ideally many) aspect(s) of your project that you will measure. It could be the mass or sequence of a protein, the presence, absence, or quantity of a biomarker, etc.

I will use methods to check how effective a protein’s physical and chemical properties (stability, folding, binding) is of my redesigned receptor-binding domains across multiple key metrics:

(1) I will measure binding affinity: the thermodynamic strength of the binding between my redesigned T7 tail fiber and the target P. aeruginosa surface receptors (e.g., OprF, PilA). (2) I will measure structural stability using the Root Mean Square Deviation (RMSD) of the polypeptide chain over time to ensure the redesign hasn’t introduced metastability and failure to adopt their native, functional conformation. (3) I will measure solubility and aggregation propensity - this will measure the likelihood of the protein remains soluble versus forming insoluble inclusion bodies during recombinant expression. (4) To ensure functional efficiency and biosafety/biosecurity compliance, I will measure Codon Adaptation Index (CAI) to ensure the elimination of regulated DNA sequences from restricted, highly pathogenic agents (e.g. Ebola, SARS-CoV-2, Anthrax).

  • Please describe all of the elements you would like to measure, and furthermore describe how you will perform these measurements.

I perform these measurements using high-performance computational simulations that act as proxies for wet lab benchtop techniques:

(1) I will use Waters Mass Spectrometry (simulated) and sequence-level verification for checking molecular weight and sequence fidelity of my 186 amino acid RBD redesigned constructs to make sure they resemble the theoretical design accurately.

Rosetta and FoldX for measuring binding energy. A more negative value shows stronger predicted binding, similar to measuring a dissociation constant KD in a physical Surface Plasmon Resonance (SPR) assay.

(2) GROMACS and OpenMM utilizing the CHARMM36 force field - can simulate the protein in a virtual solvent at 37°C (310 K), I will measure structural integrity over a 100 ns trajectory. This replaces physical stability tests like Differential Scanning Fluorimetry (DSF).

(3) AlphaFold3 and ESMFold will be used to measure the pLDDT (Predicted Local Distance Difference Test) score. High confidence scores (>70) serve as a digital measurement that the protein will adopt the functional secondary and tertiary structures.

(4) Biosecurity measurement - SecureDNA screening protocols - every redesigned sequence will be measured against global pathogen databases to ensure the host-range expansion complies with international safety standards.

  • What are the technologies you will use (e.g., gel electrophoresis, DNA sequencing, mass spectrometry, etc.)? Describe in detail.
image.png image.png

I am executing a fully in silico protein engineering project to expand the host range of the T7 bacteriophage, specifically targeting antibiotic-resistant P. aeruginosa. My workflow begins with a 20-step computational pipeline where I use ESM3 and AlphaFold3 to map the structural constraints of the gp17 receptor-binding domain and dock it against clinical Pseudomonas surface receptors. I then employ ProteinMPNN to generate a diverse library of 1,000 sequence candidates, which I filter for solubility using CamSol and rank-order based on Rosetta 𝚫G binding energy. To validate these designs digitally, I measure their structural stability through 100 ns all-atom molecular dynamics simulations in GROMACS, ensuring an RMSD of less than 2.5 Å. Finally, I generate five synthesis-ready GenBank files, optimized for high-yield recombinant expression in E. coli BL21(DE3) with a 6x-His tag, and verify them through SecureDNA biosecurity screening to ensure my redesigned viral fibers are safe for future physical production by Twist Bioscience.

(1)Theoretical pI/Mw: 27,875.41 Da

(2) I picked the two tallest peaks. Peak 1 - 848.9758 Peak 2 - 875.4421

(Step 2.1) Determine z : z = 848.9758/875.4421 - 848.9758 –> z is 32.077 or 32 rounded to nearest integer

(Step 2.2) Actual MW: mw = 32 * (875.4421 - 1.0078) mw = 27,981.89 Da

(Step 2.3) Theoretical: 27,875.41 Da

Observed: 27,875.44 Da

PPM Error: 1.07 ppm

Homework: Waters Part III — Peptide Mapping - primary structure (1) How many Lysines (K) and Arginines (R) are in eGFP? 20 Lysine residues and 6 Arginine residues.

MVS (K) GEELFTGVVP ILVELDGDVN GH (K) FSVSGEGE GDATYG (K) LT (K) LFICTTG (K) L PVPWPTLVTT LTYGVQCFS (R) YPDHM (K) QHDFF (K) SAMPE GYVQE (R) TIFF (K) DDGNY (K) (R) AEV (K) FEGDTLVN (R) IEL (K) GID (K) FEDGNILGH (K) LEYNYNSHNV YIMAD (K) Q (K) NGI (K) VNF (K) (R) HNIEDGSVQL ADHYQQNTPI GDGPVLLPDN HYLSTQSALS (K) DPNE (K) (R) DHMVLLEFVT AAGITLGMDE LY (K) LEHHHHHH

(2) 19 peptides in total.

(3) Counting all the peaks identified as being >10% relative abundance in that list, we get a total of approximately 18–20 peaks in that window.

(4) The number of peaks doesn’t match the number of predicted peptides. There are fewer peaks in the chromatogram than the tool predicted (19 peptides).

(5) The most abundant peak is at m/z 525.76712.The zoom-in shows isotope spacing of ~0.5, confirming a charge z of 2.Calculation: (525.767 * 2) - 1.008 = 1050.526Da

(6) At RT 2.78 min, the peptide is identified as FEGDTLVNR. Observed Mass: 1050.518 Da, Expected Mass: 1050.52Da, Mass Error: -3.60 ppm.

(7) Percentage of sequence confirmed: 88%

Bonus Peptide Map Questions (8) Since Figure 11 is the fragmentation of the 2.78 min peak, the sequence is FEGDTLVNR.

Theoretical Mass: 1050.52 Da

Validate by fragmentation tool: When I put the sequence FEGDTLVNR into the fragmentation predictor it generated a series of b-ions (fragments from the N-terminus) and y-ions (fragments from the C-terminus).

y1: 175.12 (R)

y2: 289.16 (NR)

y3: 388.23 (VNR)

y4: 501.31 (LVNR)

y5: 602.36 (TLVNR)

y6: 717.39 (DTLVNR)

The peak at 1050.52 Da in the spectrum represents the intact precursor peptide (the unfragmented molecule). This matches the calculated mass of the FEGDTLVNR peptide identified at the 2.78-minute retention time.

(9) The combination of high sequence coverage, accurate mass and successful fragmentation matching confirms that the sample is the eGFP standard. The small 12% gap in coverage (the white areas in Figure 6) is normal, as some peptides may be too small or too large to be easily detected under standard LC-MS conditions.

Homework: Waters Part IV — Oligomers

The species are identified by finding the peaks that most closely align with the multiples of the subunit masses.

7FU Decamer is the peak at 3.4 MDa.

8FU Didecamer is the peak at 8.33 MDa.

8FU 3-Decamer is the peak at 12.67 MDa.

8FU 4-Decamer is the small signal located around 16.0 MDa.

Homework: Waters Part V — Did I make GFP?

image.png image.png

Week 11 HW: Bioproduction & Cloud Labs

Part A: The 1,536 Pixel Artwork Canvas | Collective Artwork

(1) I didn’t manage to contribute to the artwork. (2) I liked the fact that every pixel removed / or placed directly influences a cell-free protein synthesis optimisation experiment, it makes the art feel alive and purposeful. A couple of suggestions and ideas I thought, instead of removing pixels to end the experiment, perhaps next year could feature a growth versus decay mechanic where different biological inputs (represented by different colours) compete for dominance on the plate. It would be interesting to have a secondary window showing a live feed or a time-lapse of the actual laboratory plate being manipulated by the cloud lab robots as we click, although the slider showing the bioart over time is incredible. To prevent griefing or to encourage rapid collaboration during peak hours, the cooldown could scale based on the complexity of the protein being synthesised in that specific quadrant.

Part B: Cell-Free Protein Synthesis | Cell-Free Reagents

(1) Referencing the cell-free protein synthesis reaction composition (the middle box outlined in yellow on the image above, also listed below), provide a 1-2 sentence description of what each component’s role is in the cell-free reaction.

E. coli Lysate - BL21 (DE3) Star Lysate (includes T7 RNA Polymerase) Lysate - contains protein translation machinery - ribosomes, tRNAs, elongation factors, chaperones and here, T7 RNA Polymerase specifically included to read T7-promoter-driven DNA into mRNA. T7 RNAP is preferred because it is a single-subunit enzyme, and therefore is simpler and faster than the multi-subunit E. coli RNAP.

Salts/Buffer - regulate pH in the cell. Regulates ionic strength, pH (~7.5) and Mg²⁺ concentration important for ribosome assembly and polymerase activity.

Salts/Buffer (K-glutamate, HEPES-KOH, Mg-glutamate, K-phosphate) These maintain ionic strength, pH (~7.5), and Mg²⁺ concentration — all critical for ribosome assembly and polymerase activity. Glutamate is used as the counterion partly because it is a natural E. coli cytoplasmic osmolyte.

Energy / Nucleotide System - (Ribose/Glucose AMP, CMP, GMP, UMP, Guanine) - Nucleotides are substrates for transcription (NTPs built from NMPs + phosphate donors). Ribose and glucose feed metabolic pathways that regenerate energy carriers. This group is where ATP regeneration strategy lives.

Translation Mix (Amino Acids) - Amino acids are the single unit substrates for the ribosome. Tyrosine and cysteine are listed separately because of solubility issues at the stock concentration used for the other 17.

Nicotinamide - Acts as an NAD⁺ precursor, supporting redox reactions in the energy metabolism pathways that regenerate ATP.

Nuclease-Free Water - Backfill to final reaction volume, it ensures no RNase contamination that would degrade mRNA templates.

(2) 1 hr incubation has a PEP based system which has NTPs supplied and 20 hr incubation has a ribose based system (more sustainable, precursor driven and indirectly supplied NTPs).

(3) Bonus question: How can transcription occur if GMP is not included but Guanine is? It can because the cell-free extract contains enzymes that were freeze-dried and are reactivated by water. Those enzymes include the purine salvage pathway machinery to make transcription occur. Once water rehydrates the extract and those enzymes become active again, hypoxanthine-guanine phosphoribosyltransferase (HGPRT) can use the cosubstrate hypoxanthine-guanine phosphoribosyltransferase (HGPRT) requires 5-phosphoribosyl 1-pyrophosphate (PRPP) as its necessary co-substrate to attach guanine to the ribose-5-phosphate sugar, forming guanosine monophosphate (GMP).

Part C: Planning the Global Experiment | Cell-Free Master Mix Design

(1) Given the 6 fluorescent proteins we used for our collaborative painting, identify and explain at least one biophysical or functional property of each protein that affects expression or readout in cell-free systems. (Hint: options include maturation time, acid sensitivity, folding, oxygen dependence, etc) (1-2 sentences each)

sfGFP - can fold correctly and fluoresce whilst being fused to poorly folded proteins or expressed in the complex chemical environments of cell-free extracts.

mRFP1 - quite slow maturation time compared to newer variants, causes a significant delay between protein synthesis and the appearance of a red signal in the painting.

mKO2 - high oxygen dependence for chromophore maturation, meaning that if the cell-free reaction has not got sufficient oxygen, the orange readout can be very diminished.

mTurquoise2 - valued for its high photostability and quantum yield, providing a very bright and steady cyan readout that resists bleaching during prolonged imaging or observation.

mScarlet_I - features rapid maturation making it one of the fastest red fluorescent proteins for real-time tracking of expression in cell-free reactions.

Electra2 - fast-maturing and bright green-yellow protein optimised for high-speed readout, often used in systems where rapid signal generation is critical to distinguish early expression dynamics.

(2) Create a hypothesis for how adjusting one or more reagents in the cell-free mastermix could improve a specific biophysical or functional property you identified above, in order to maximize fluorescence over a 36-hour incubation. Clearly state the protein, the reagent(s), and the expected effect.

To maximize the fluorescence of mRFP1 over 36 hrs of incubation, increasing the concentration of glucose and ribose sugars would address its slow maturation time.

Hypothesis By increasing the concentration of glucose and ribose, we could extend the metabolic lifespan of the cell-free reaction; this allows the reaction to remain energetically active long enough for the slow-maturing mRFP1 chromophore to fully transition into its fluorescent state before the system’s machinery degrades.

We would observe a significantly higher final fluorescence intensity for mRFP1 at the 36-hour mark, as the prolonged energy supply prevents the starvation of the reaction before the protein has had sufficient time to mature and glow.

(3) The second phase of this lab will be to define the precise reagent concentrations for your cell-free experiment. You will be assigned artwork wells with specific fluorescent proteins and receive an email with instructions this week (by April 24). You can begin composing master mix compositions here.

(3) image.png image.png image.png image.png

``[
  {
    "quadrant": "Q1",
    "well_label": "H10",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1025
      },
      {
        "id": "aa_mix_17",
        "supplemental_volume_nl": 375
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 475
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 125
      }
    ]
  },
  {
    "quadrant": "Q1",
    "well_label": "H7",
    "supplements": [
      {
        "id": "nuclease_free_water",
        "supplemental_volume_nl": 1025
      },
      {
        "id": "aa_mix_17",
        "supplemental_volume_nl": 375
      },
      {
        "id": "ribose",
        "supplemental_volume_nl": 475
      },
      {
        "id": "glucose",
        "supplemental_volume_nl": 125
      }
    ]
  }
]

Week 12 HW: Building Genomes

Post Lab Questions | Mandatory for All Students

(1) Which genes when transferred into E. coli will induce the production of lycopene and beta-carotene, respectively?

While E. coli naturally possesses the MEP pathway to produce the precursors IPP and DMAPP, it lacks the downstream enzymes required to synthesize lycopene. To enable lycopene production, the following three genes are required: crtE: Encodes geranylgeranyl pyrophosphate (GGPP) synthase, crtB: Encodes phytoene synthase, crtI: Encodes phytoene desaturase

The introduction of this gene set, collectively known as crtEBI allows the recombinant strain to accumulate lycopene.

The gene encoding lycopene cyclase is typically designated as crtY.

In addition to these primary synthesis genes, other sources highlight that production can be further enhanced by overexpressing genes like dxs (1-deoxy-D-xylulose-5-phosphate synthase) and idi (isopentenyl diphosphate isomerase) to increase metabolic flux toward the carotenoid pathway.

(2) Why do the plasmids that are transferred into the E. coli need to contain an antibiotic resistance gene?

In the Du et al. (2016) paper, the recombinant strain E. coli K12f-pACLYC carries the plasmid pACLYC, which contains necessary genes for lycopene synthesis (crtE, crtB, and crtI). To ensure that the bacteria do not lose this plasmid as they divide, chloramphenicol (an antibiotic) is added to the culture medium. Because the plasmid provides resistance to chloramphenicol, only the cells that successfully retain the plasmid can survive and grow in the treated medium, so selective pressure means the bacteria retain the plasmid, thereby ensuring consistent lycopene production throughout the fermentation process.

(3) What outcomes might we expect to see when we vary the media, presence of fructose, and temperature conditions of the overnight cultures? Varying the growth conditions of the recombinant E. coli K12f-pACLYC strain demonstrates that fructose is the superior carbon source, outperforming glucose with a 3-fold increase in cell mass and a 7-fold increase in lycopene yield. This occurs because fructose uniquely reconfigures the bacteria’s metabolism: it up-regulates genes for its own transport while down-regulating pathways that produce waste (like acetate and lactate), leading to an accumulation of the essential precursors pyruvate and G-3-P. Additionally, fructose boosts the TCA cycle and oxidative phosphorylation to provide the abundant ATP and NADPH required for synthesis. To achieve these outcomes, cultures must be maintained at 37 °C with chloramphenicol to prevent plasmid loss, with harvesting typically occurring at 14 hours for fructose-grown cells to capture the peak mid-growth phase.

(4) Generally describe what “OD600” measures and how it can be interpreted in this experiment. OD600 determines relative cell concentration, and it can be interpreted as normalising each sample’s absorption peak measurement for the relevant pigment by the OD600 measurement from the corresponding bacterial culture. It can measure which culture conditions led to the highest production of either Lycopene or Beta-Carotene.

(5) What are other experimental setups where we may be able to use acetone to separate cellular matter from a compound we intend to measure?

Based on the chemical properties of the compounds discussed, other experimental setups where acetone could be used to separate cellular matter from a target compound include: - Since lycopene is a tetraterpenoid carotenoid, acetone is a standard choice for extracting other members of this pigment family, such as beta-carotene, which the sources also identify as a target for metabolic engineering in E. coli

  • The sources mention that lycopene is naturally obtained from plants like tomatoes
  • Acetone can be used in experimental setups to separate these pigments from plant tissues, specifically tomato peels, often in conjunction with cell-wall degrading enzymes to improve yield
  • While the primary method described involves post-growth extraction, the sources reference “organic/aqueous culture systems” for in situ extraction
  • In such a setup, an organic solvent could potentially be used during the fermentation process itself to continuously separate the lipophilic product from the cellular biomass.

(6) Why might we want to engineer E. coli to produce lycopene and beta-carotene pigments when Erwinia herbicola naturally produces them?