Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    9th February 2026

  • Week 2 HW: DNA Read, Write, & Edit

    Part 1: Benchling & In-silico Gel Art Part 3: DNA Design Challenge 3.1. Choose your protein. I chose the PETase enzyme protein from the bacterium species Ideonella sakaiensis (strain 201-F6). I chose this protein as it was discovered to be important in plastic degradation. Its plastic degradation capabilities means that it allows bioremediation by reducing plastic pollution and promoting a circular economy.

  • Week 3 HW: Lab Automation

    Assignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME! A Blooming Daisy Flower PINK, PURPLE & BLUE DESIGN! :) INITIAL DESIGN: Python documentation from opentrons import types metadata = { 'author': 'Tammy Sisodiya', 'protocolName': ' HTGAA Dazzling Daisy', 'description': 'A blooming Daisy flower in Purple, Pink, and Blue.', 'source': 'HTGAA 2026 Opentrons Lab', 'apiLevel': '2.20' } ############################################################################## ### Robot deck setup constants ############################################################################## TIP_RACK_DECK_SLOT = 9 COLORS_DECK_SLOT = 6 AGAR_DECK_SLOT = 5 PIPETTE_STARTING_TIP_WELL = 'A1' # UPDATED: Mapping the new lab colors to source wells well_colors = { 'A1' : 'Purple', 'B1' : 'Pink', 'C1' : 'Blue' } def run(protocol): # Tips tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips') # Pipettes pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul]) # Modules temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT) temperature_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul', 'Cold Plate') color_plate = temperature_plate # Agar Plate agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate') center_location = agar_plate['A1'].top() pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL) # Helper Functions def location_of_color(color_string): for well,color in well_colors.items(): if color.lower() == color_string.lower(): return color_plate[well] raise ValueError(f"No well found with color {color_string}") def dispense_and_detach(pipette, volume, location): assert(isinstance(volume, (int, float))) above_location = location.move(types.Point(z=location.point.z + 5)) pipette.move_to(above_location) pipette.dispense(volume, location) pipette.move_to(above_location) ### YOUR DESIGN DATA ### sfgfp_points = [(-4.4, 26.4),(1.1, 26.4),(2.2, 26.4),(3.3, 26.4),(4.4, 26.4),(-6.6, 25.3),(-5.5, 25.3),(-4.4, 25.3),(-3.3, 25.3),(1.1, 25.3),(4.4, 25.3),(-12.1, 24.2),(-11, 24.2),(-9.9, 24.2),(-8.8, 24.2),(-7.7, 24.2),(-6.6, 24.2),(-2.2, 24.2),(0, 24.2),(4.4, 24.2),(-13.2, 23.1),(-12.1, 23.1),(-7.7, 23.1),(-6.6, 23.1),(-2.2, 23.1),(4.4, 23.1),(6.6, 23.1),(7.7, 23.1),(8.8, 23.1),(-13.2, 22),(-7.7, 22),(-6.6, 22),(-2.2, 22),(0, 22),(4.4, 22),(5.5, 22),(6.6, 22),(8.8, 22),(-13.2, 20.9),(-7.7, 20.9),(-2.2, 20.9),(3.3, 20.9),(4.4, 20.9),(9.9, 20.9),(-13.2, 19.8),(-7.7, 19.8),(-2.2, 19.8),(3.3, 19.8),(9.9, 19.8),(-13.2, 18.7),(-2.2, 18.7),(8.8, 18.7),(9.9, 18.7),(-13.2, 17.6),(-2.2, 17.6),(-1.1, 17.6),(8.8, 17.6),(-18.7, 16.5),(-17.6, 16.5),(-16.5, 16.5),(-15.4, 16.5),(-14.3, 16.5),(-12.1, 16.5),(-2.2, 16.5),(7.7, 16.5),(8.8, 16.5),(-20.9, 15.4),(-19.8, 15.4),(-18.7, 15.4),(-14.3, 15.4),(-13.2, 15.4),(-12.1, 15.4),(-2.2, 15.4),(-1.1, 15.4),(7.7, 15.4),(-20.9, 14.3),(-13.2, 14.3),(-12.1, 14.3),(-11, 14.3),(-3.3, 14.3),(-2.2, 14.3),(5.5, 14.3),(6.6, 14.3),(-20.9, 13.2),(-9.9, 13.2),(-8.8, 13.2),(-3.3, 13.2),(4.4, 13.2),(5.5, 13.2),(-20.9, 12.1),(-8.8, 12.1),(-7.7, 12.1),(-3.3, 12.1),(-2.2, 12.1),(2.2, 12.1),(3.3, 12.1),(5.5, 12.1),(6.6, 12.1),(7.7, 12.1),(8.8, 12.1),(9.9, 12.1),(11, 12.1),(-20.9, 11),(-19.8, 11),(-6.6, 11),(-5.5, 11),(-4.4, 11),(0, 11),(1.1, 11),(2.2, 11),(3.3, 11),(4.4, 11),(11, 11),(-19.8, 9.9),(-4.4, 9.9),(-3.3, 9.9),(-2.2, 9.9),(-1.1, 9.9),(0, 9.9),(11, 9.9),(-23.1, 8.8),(-22, 8.8),(-20.9, 8.8),(-19.8, 8.8),(-18.7, 8.8),(-17.6, 8.8),(-5.5, 8.8),(-4.4, 8.8),(-3.3, 8.8),(-2.2, 8.8),(-1.1, 8.8),(0, 8.8),(9.9, 8.8),(11, 8.8),(15.4, 8.8),(16.5, 8.8),(17.6, 8.8),(18.7, 8.8),(19.8, 8.8),(20.9, 8.8),(22, 8.8),(23.1, 8.8),(24.2, 8.8),(-23.1, 7.7),(-3.3, 7.7),(-1.1, 7.7),(0, 7.7),(1.1, 7.7),(2.2, 7.7),(8.8, 7.7),(9.9, 7.7),(14.3, 7.7),(15.4, 7.7),(16.5, 7.7),(20.9, 7.7),(22, 7.7),(24.2, 7.7),(-24.2, 6.6),(-23.1, 6.6),(-5.5, 6.6),(-4.4, 6.6),(-3.3, 6.6),(-2.2, 6.6),(1.1, 6.6),(2.2, 6.6),(7.7, 6.6),(8.8, 6.6),(9.9, 6.6),(11, 6.6),(12.1, 6.6),(13.2, 6.6),(14.3, 6.6),(16.5, 6.6),(19.8, 6.6),(20.9, 6.6),(23.1, 6.6),(-22, 5.5),(-9.9, 5.5),(-7.7, 5.5),(-6.6, 5.5),(-5.5, 5.5),(-4.4, 5.5),(-2.2, 5.5),(3.3, 5.5),(4.4, 5.5),(13.2, 5.5),(16.5, 5.5),(18.7, 5.5),(19.8, 5.5),(23.1, 5.5),(-20.9, 4.4),(-19.8, 4.4),(-18.7, 4.4),(-17.6, 4.4),(-16.5, 4.4),(-15.4, 4.4),(-14.3, 4.4),(-13.2, 4.4),(-12.1, 4.4),(-9.9, 4.4),(-8.8, 4.4),(-7.7, 4.4),(-2.2, 4.4),(4.4, 4.4),(5.5, 4.4),(6.6, 4.4),(13.2, 4.4),(16.5, 4.4),(17.6, 4.4),(18.7, 4.4),(23.1, 4.4),(-12.1, 3.3),(-11, 3.3),(-2.2, 3.3),(5.5, 3.3),(7.7, 3.3),(8.8, 3.3),(9.9, 3.3),(11, 3.3),(12.1, 3.3),(13.2, 3.3),(16.5, 3.3),(17.6, 3.3),(23.1, 3.3),(-13.2, 2.2),(-12.1, 2.2),(-2.2, 2.2),(6.6, 2.2),(9.9, 2.2),(16.5, 2.2),(17.6, 2.2),(18.7, 2.2),(19.8, 2.2),(20.9, 2.2),(22, 2.2),(-14.3, 1.1),(-2.2, 1.1),(7.7, 1.1),(9.9, 1.1),(11, 1.1),(15.4, 1.1),(16.5, 1.1),(22, 1.1),(-15.4, 0),(-14.3, 0),(-2.2, 0),(7.7, 0),(11, 0),(14.3, 0),(15.4, 0),(22, 0),(-15.4, -1.1),(-2.2, -1.1),(3.3, -1.1),(8.8, -1.1),(13.2, -1.1),(14.3, -1.1),(20.9, -1.1),(-15.4, -2.2),(-14.3, -2.2),(-13.2, -2.2),(-12.1, -2.2),(-8.8, -2.2),(-7.7, -2.2),(-6.6, -2.2),(-2.2, -2.2),(3.3, -2.2),(8.8, -2.2),(11, -2.2),(12.1, -2.2),(13.2, -2.2),(20.9, -2.2),(-14.3, -3.3),(-11, -3.3),(-8.8, -3.3),(-7.7, -3.3),(-6.6, -3.3),(-3.3, -3.3),(-2.2, -3.3),(3.3, -3.3),(4.4, -3.3),(8.8, -3.3),(11, -3.3),(19.8, -3.3),(20.9, -3.3),(-13.2, -4.4),(-12.1, -4.4),(-11, -4.4),(-8.8, -4.4),(-3.3, -4.4),(-2.2, -4.4),(4.4, -4.4),(5.5, -4.4),(6.6, -4.4),(7.7, -4.4),(8.8, -4.4),(13.2, -4.4),(19.8, -4.4),(-16.5, -5.5),(-15.4, -5.5),(-14.3, -5.5),(-13.2, -5.5),(-8.8, -5.5),(-4.4, -5.5),(-3.3, -5.5),(-2.2, -5.5),(4.4, -5.5),(14.3, -5.5),(15.4, -5.5),(16.5, -5.5),(18.7, -5.5),(19.8, -5.5),(-19.8, -6.6),(-18.7, -6.6),(-17.6, -6.6),(-16.5, -6.6),(-8.8, -6.6),(-4.4, -6.6),(-1.1, -6.6),(3.3, -6.6),(4.4, -6.6),(17.6, -6.6),(18.7, -6.6),(-23.1, -7.7),(-22, -7.7),(-20.9, -7.7),(-19.8, -7.7),(-17.6, -7.7),(-16.5, -7.7),(-15.4, -7.7),(-8.8, -7.7),(-7.7, -7.7),(-6.6, -7.7),(-5.5, -7.7),(-4.4, -7.7),(-3.3, -7.7),(-2.2, -7.7),(-1.1, -7.7),(0, -7.7),(2.2, -7.7),(3.3, -7.7),(16.5, -7.7),(17.6, -7.7),(-24.2, -8.8),(-23.1, -8.8),(-14.3, -8.8),(-13.2, -8.8),(-8.8, -8.8),(-7.7, -8.8),(-3.3, -8.8),(-2.2, -8.8),(0, -8.8),(1.1, -8.8),(2.2, -8.8),(3.3, -8.8),(5.5, -8.8),(14.3, -8.8),(15.4, -8.8),(16.5, -8.8),(-26.4, -9.9),(-25.3, -9.9),(-24.2, -9.9),(-12.1, -9.9),(-11, -9.9),(-9.9, -9.9),(-8.8, -9.9),(-3.3, -9.9),(0, -9.9),(7.7, -9.9),(8.8, -9.9),(11, -9.9),(12.1, -9.9),(13.2, -9.9),(-27.5, -11),(-26.4, -11),(-25.3, -11),(-24.2, -11),(-23.1, -11),(-22, -11),(-20.9, -11),(-19.8, -11),(-18.7, -11),(-17.6, -11),(-16.5, -11),(-15.4, -11),(-14.3, -11),(-13.2, -11),(-12.1, -11),(-11, -11),(-3.3, -11),(0, -11),(-28.6, -12.1),(-27.5, -12.1),(-19.8, -12.1),(-18.7, -12.1),(-17.6, -12.1),(-15.4, -12.1),(-12.1, -12.1),(-4.4, -12.1),(-3.3, -12.1),(-2.2, -12.1),(0, -12.1),(-28.6, -13.2),(-27.5, -13.2),(-20.9, -13.2),(-19.8, -13.2),(-12.1, -13.2),(-4.4, -13.2),(0, -13.2),(-26.4, -14.3),(-25.3, -14.3),(-13.2, -14.3),(-12.1, -14.3),(-5.5, -14.3),(-2.2, -14.3),(0, -14.3),(-23.1, -15.4),(-20.9, -15.4),(-19.8, -15.4),(-13.2, -15.4),(-8.8, -15.4),(-7.7, -15.4),(-6.6, -15.4),(-1.1, -15.4),(0, -15.4),(-18.7, -16.5),(-16.5, -16.5),(-15.4, -16.5),(-14.3, -16.5),(-13.2, -16.5),(-12.1, -16.5),(-9.9, -16.5),(-8.8, -16.5),(-2.2, -16.5),(0, -16.5),(-2.2, -17.6),(0, -17.6),(0, -18.7),(-1.1, -19.8),(1.1, -19.8),(-1.1, -20.9),(1.1, -20.9),(2.2, -20.9),(0, -22),(3.3, -22),(12.1, -22),(13.2, -22),(14.3, -22),(15.4, -22),(0, -23.1),(4.4, -23.1),(5.5, -23.1),(9.9, -23.1),(11, -23.1),(12.1, -23.1),(13.2, -23.1),(1.1, -24.2),(2.2, -24.2),(5.5, -24.2),(6.6, -24.2),(7.7, -24.2),(8.8, -24.2),(9.9, -24.2),(11, -24.2),(12.1, -24.2),(2.2, -25.3),(3.3, -25.3),(4.4, -25.3),(9.9, -25.3),(11, -25.3),(5.5, -26.4),(6.6, -26.4),(7.7, -26.4),(8.8, -26.4)] mrfp1_points = [(-15.4, 12.1),(-14.3, 12.1),(-14.3, 11),(-13.2, 11),(-12.1, 11)] mscarlet_i_points = [(-11, 20.9),(-9.9, 20.9),(-11, 19.8),(-9.9, 19.8),(-9.9, 18.7)] mko2_points = [(3.3, 18.7),(4.4, 18.7),(5.5, 18.7),(6.6, 18.7),(4.4, 17.6)] mjuniper_points = [(6.6, 9.9),(7.7, 9.9),(4.4, 8.8),(5.5, 8.8),(6.6, 8.8),(7.7, 8.8),(-6.6, 2.2),(-9.9, 1.1),(-8.8, 1.1),(-7.7, 1.1),(-6.6, 1.1)] electra2_points = [(1.1, 4.4),(1.1, 3.3),(1.1, 2.2),(1.1, 1.1)] # 2. UPDATED Design Mapping # Purple for the large petals, Pink for highlights, Blue for details. layers = [ ('Purple', sfgfp_points), ('Pink', mrfp1_points), ('Pink', mscarlet_i_points), ('Pink', mko2_points), ('Blue', mjuniper_points), ('Blue', electra2_points) ] # 3. Execution Loop drop_vol = 1.0 for color_name, points in layers: if not points: continue source_well = location_of_color(color_name) for i in range(0, len(points), 15): chunk = points[i:i + 15] pipette_20ul.pick_up_tip() aspirate_vol = (len(chunk) * drop_vol) + 2.0 if aspirate_vol > 20.0: aspirate_vol = 20.0 pipette_20ul.aspirate(aspirate_vol, source_well) for x, y in chunk: if (x2 + y2) < 1600: target_point = center_location.point + types.Point(x=x, y=y, z=0) target_loc = types.Location(target_point, None) dispense_and_detach(pipette_20ul, drop_vol, target_loc) # Return residual to source well top to avoid contamination if pipette_20ul.current_volume > 0: pipette_20ul.dispense(pipette_20ul.current_volume, source_well.top()) pipette_20ul.drop_tip() Post-Lab Questions — DUE BY START OF FEB 24 LECTURE

  • Week 4 HW: Protein Design Part I

    Part B: Protein Analysis and Visualization In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins.

  1. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions. Briefly describe the protein you selected and why you selected it. I chose the p53 protein, which triggers programmed cell death when ailments like cancer cause extensive DNA damage from oxidative stress like UV light, oxygen radicals or chemicals. In a cancerous cell, the p53 protein will travel to the nucleus and signal the mitochondria to release reactive oxygen species or increase calcium levels. Other death factors released include cytochrome c, which activates caspases and SMAC which blocks survival proteins (Fogg et al., 2011). I selected this protein as mutations in this protein can cause cancer and it is vital to protect the human genome from damage .

Subsections of Homework

Week 1 HW: Principles and Practices

9th February 2026

Tammy Sisodiya

Figure 1: Diagram of how the biocartridge works

QUESTION 2

I chose ensuring environmental and public health safety (Non-Maleficence) as the governance / policy goal. I defined some subgoals.

Subgoals:

  1. Microbial competition for natural resources like food, space and nutrients due to invasive bacteria entering water bodies, and due to selective, evolutionary advantages such as faster growth and being able to use nutrients efficiently compared to native microbial populations .

Policy requirement: We can use multi-layered biocontainment for bio-cartridges. For example, we can engineer the bacteria to metabolically require a nutrient for their growth and development which is not found in natural water bodies, therefore it will not outgrow and compete with native microbes. We can also use a environmental condition resistant, hydrogel based system to encapsulate the bacteria, ensuring it cannot escape into the water bodies.

Specific metric to assess the policy: If there is no detergent or clothes washing chemicals used, it should automatically trigger the genetic kill-switch which conditionally over expresses the toxic essential gene in toxin-antitoxin systems or activates selective nutrient dependency.

Traceability: We must use DNA barcoding for bacterial species identification. The bacterial strain will have a inoperative genetic sequence which allows regulators to trace bacteria to its manufacturer.

  1. Horizontal gene transfer occurrence - plastic recycling enzyme and antibiotic resistance genes transferring into pathogenic bacteria. The plastic degrading enzymes may, if released, also affect objects composed of PET of importance, such as in dashboards, door panels, engine covers, ignition components, gear housings and connector housings in the automotive industry and in pill bottles, for example, in the pharmaceutical industry.

Policy requirement: We can ensure bacteria do not contain antibiotic resistance genes in their genome which can be evolutionarily selected for horizontal gene transfer.

Specific metric to assess the policy: We can use gene editing technologies like CRISPR-Cas9 to ensure no antibiotic resistance genes remain to be transferred to pathogenic microbes. Policies need to elucidate that PETase’s plastic breakdown ability doesn’t create harmful chemicals which are more toxic than the microplastic.

When I asked Gemini, “What should I measure for governance/policy goals related to ensuring that this application or tool contributes to an ‘ethical’ future for a bacteria coculture based project?” the AI suggested Safety and Biocontainment Metrics, Environmental Justice & Sustainability & Equity and Global Governance (Google, 2026).

3) QUESTION 3

Proposed Governance ActionAspectDescription & Implementation
Regulating and approving the genetic kill and selective nutrient dependency mechanisms in bacteria
(e.g., Synthetic Auxotrophy)

Actor: Federal regulators
PurposeIf there is damage or destruction to the bio cartridge causing escape of the microbes, the microbial survival rate can be set at <10−8 (Cell Biology by the Numbers, 2015) to ensure that if an organism escapes, it is highly likely to not survive almost immediately, reducing the biosafety threat to ecosystems.
DesignFederal regulators will need to approve the: genetic kill switch mechanism with toxin-antitoxin systems to kill bacteria and the gene editing to make engineered PETase plastic degrading bacteria with a metabolic dependency mechanism by selective nutrient dependency.
AssumptionsWe know the microbial composition of our existing water bodies and the nutrient dependencies they have compared to our engineered microbe.
Risks of Failure & “Success”Risk: Mutations may occur to avoid selective nutrient dependency gene editing. Success: Manufacturers can identify bacteria they’ve edited compared to natural microbial populations and can understand and control their growth.
Providing an incentive for the sustainable upcycling of bio cartridges


Actor: Private companies and device manufacturing companies (Miele, Bosch, Hotpoint)
PurposeCompanies at present that produce filters do not provide incentives for users combined with promoting sustainability measures. Users will receive money off their energy bill or new cartridges when they return old cartridges for commercial sterilisation and nutrient reuse.
DesignCompanies will finance the cartridge returning program, from the user’s home to the bio cartridge manufacturing / sterilising and nutrient reuse depot.
AssumptionsWe can assume users will keep the cartridge until they receive confirmation to return it back instead of disposing in the bin.
Risks of Failure & “Success”Risk: High fossil fuel and carbon dioxide emission production from transporting cartridges would exceed the environmental advantageousness of the textile microplastics degradation and byproduct upcycling. Success: Promotes a Circular economy.
Providing the no marker policy for bacteria


Actor: Academic researchers
PurposeResearchers will use gene editing to make a unique short genetic sequence in the engineered bacteria genome to identify it when it escapes by researchers and federal regulators, and we can trace back to the institution where it was produced.
DesignWe will create a database to add all of the engineered bacteria genetic sequences so they can be differentiated and identified as escaped microbes.
Assumptions & Risks of Failure & “Success”Assumption: We can assume the unique sequence in the bacterial genome will not mutate over time. Risks of Failure & “Success”: Failure: Selective pressures to remove DNA which does not serve a function in their genome. Success: Chances are the no marker trait may be selectively picked.

QUESTION 4

5) PRIORITIES, TRADE-OFFS, ASSUMPTIONS AND ETHICAL CONCERNS

1 - Our audiences - The audiences are the United Nations Environment Programme (UNEP) Global Plastics Treaty and Industry Consortia. I prioritise regulation and extended producer responsibility (EPR) (OECD, 2001) as a combination of options. Why? Companies do not take liability for the ocean microplastics pollution problem, as it is not owned by anyone solely (regulation). Therefore, EPR gives the incentive to use products like PHA to make other important products like sustainable packaging and agricultural films.

Bio Cartridges once filled can be sent back and with the circular economy approach, the PHA collected can be used for other purposes such as sustainable packaging, agricultural films, medical devices and consumer goods. Therefore, EPR can be used by manufacturers, as if the consumer’s cartridge is full they can send it back for free and can receive a discount for a new bio cartridge. A loyalty program will ensure with 3 uses, that they receive a cartridge for free.

2 - Trade Offs include the cost of buying and receiving new cartridges by transport, transporting old cartridges for cartridge cleaning and reuse and sending back for households with less income.

The bio cartridge if attached to the machine will make the price more expensive. We can mitigate this if we make the bio cartridge addition optional but a recommended action for reducing microplastic pollution. The government may also support with subsidies.

Another tradeoff is the biosecurity risk associated with bacteria like B. subtilis and P. putida being used in the home. We mitigate this by using the immobilisation technique of anchoring the bacteria to a solid cellulose membrane, which increases biosafety and decreases risk of contamination in water bodies in the event the cartridge is damaged/broken.

3 - Assumptions and Uncertainties

Assumptions we make is:

Our PETase enzyme can survive the high alkalinity of detergents in the washing machine, even when optimised in silico. We must also assume users will return bio cartridges back and will not dispose in the bin, as the circular economy approach using P. putidain layer 2 will be missed. The cost of harvesting PHA will far outweigh the cost of cartridge returning, sterilisation and nutrient recycling processes and sending back to the consumer. But we do not know if P. putida will overcome the upcycling ability by mutating over multiple generations.

4 - Ethical concerns we may have

Possibility of microbe leakage in natural water bodies, biocontainment hazard.

Is all microplastic pollution the individual’s responsibility? The cost of using the biocartridge may be a disadvantage to lower income households if not used as an addition and shifts all liability from manufacturers to individuals. Laundry must be affordable for all.

What are the proposed governance actions for these ethical concerns?

  • We can use genetic kill switches to kill bacteria if grown on a selective nutrient to ensure metabolic dependency on it.
  • Cartridges are funded by manufacturers to be sent back by consumers and cleaned thoroughly to send back, so the cost is not upon the consumer
  • Biosafety regulators will ensure all bio cartridges are certified to make sure they are tested for breaking/damage and the non functional DNA watermark is added in the bacterial genome and was added to the DNA barcoding database (is all open source).

Week 2 Lecture Prep

Homework Questions from Professor Jacobson:

  1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy? The error rate of DNA polymerase is between 10-4 to 10-6. The human genome is composed of 3 billion base pairs, and every cell division ends up with 3,000–6,000 mistakes. Biology deals with the discrepancy by allowing 3′→5′ exonuclease activity, which proofreads the DNA for errors and takes off wrong bases straight away. After DNA is replicated, mismatch repair occurs, the freshly replicated DNA is checked for mismatches which have occurred, and will removes the incorrect base, to replace it with the right one.

  2. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest? Total variations which are possible with 4 base pairs and 1036 human base pairs = 4^1036.

So log10 (41036) = 1036 * log10 (4) 1036 * 0.60206 = 623.734. This results in: approx 5.42 * 10623 possible DNA sequences. Some codons encode for stop codons which terminate the sequence, there may be a hairpin structure which blocks polymerases and the ribosome cannot bind and make any protein. Codon choice may be important for an organism (bacteria and humans use different codons), as if a sequence uses codons for which the cell has few transfer RNAs, the ribosome will stop and produce no protein.

Homework Questions from Dr. LeProust:

  1. What’s the most commonly used method for oligo synthesis currently? The most common method used is solid phase phosphoramidite chemistry.

  2. Why is it difficult to make oligos longer than 200nt via direct synthesis? The quality and the amount of product obtained from the chemical synthesis is poor as the sequence length exceeds longer than 200nt.

  3. Why can’t you make a 2000bp gene via direct oligo synthesis? 2000bp gene is not possible to create via oligo synthesis as it will be error prone, amount obtained will be less and the quality will be reduced. Sequences should be between <150–200 bases.

Homework Question from Prof. Church:

What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”? Histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan and valine. Mammals cannot directly produce lysine in their bodies and rely on getting lysine from consuming foods such as beef, pork, chicken, fish (cod, sardines), dairy and soy products which contain it, as opposed to what was insinuated in the Lysine Contingency where dinosaurs are genetically modified with the inability to produce lysine (Lopez & Mohiuddin, 2024).

References

Bao, T., Qian, Y., Xin, Y., Collins, J. J., & Lu, T. (2023). Engineering microbial division of labor for plastic upcycling. Nature Communications, 14(1), 5712. https://doi.org/10.1038/s41467-023-40777-x

Cell biology by the numbers. (2015). Garland Science. https://doi.org/10.1201/9780429258770

Federley, R. G., & Romano, L. J. (2010). DNA polymerase: Structural Homology, Conformational Dynamics, and the Effects of Carcinogenic DNA Adducts. Journal of Nucleic Acids, 2010, 457176. https://doi.org/10.4061/2010/457176

Google. (2026). Gemini (Feb 10 version) [Large language model]. https://gemini.google.com

“Kill switch” design strategies for genetically modified organisms | Physical and Life Sciences Directorate. (n.d.). Retrieved 10 February 2026, from https://pls.llnl.gov/article/26016/kill-switch-design-strategies-genetically-modified-organisms

Lopez, M. J., & Mohiuddin, S. S. (2025). Biochemistry, essential amino acids. In StatPearls. StatPearls Publishing. http://www.ncbi.nlm.nih.gov/books/NBK557845/

Microfibres: The plastic in our clothes | Friends of the Earth. (n.d.). Retrieved 10 February 2026, from https://friendsoftheearth.uk/plastics/microfibres-plastic-in-our-clothes

Microfiber filter. (n.d.). PlanetCare. Retrieved 10 February 2026, from https://planetcare.org/pages/microfiber-filter-washing-machine

OECD. (2001). Extended producer responsibility: A guidance manual for governments. OECD. https://doi.org/10.1787/9789264189867-en

Planetcare | the most effective solution to stop microfiber pollution. (n.d.). PlanetCare. Retrieved 10 February 2026, from https://planetcare.org/

Team:Exeter/Hardware—2019.igem.org. (n.d.). Retrieved 10 February 2026, from https://2019.igem.org/Team:Exeter/Hardware

Vassilenko, K., Watkins, M., Chastain, S., Posacka, A., & Ross, P. S. (2019). Me, my clothes and the ocean: The role of textiles in microfiber pollution (Ocean Wise Science Feature). Ocean Wise Conservation Association. https://assets.ctfassets.net/fsquhe7zbn68/4MQ9y89yx4KeyHv9Svynyq/8434de64585e9d2cfbcd3c46627c7a4a/Research_MicrofibersReport_191004-e.pdf

World Health Organization. (2022). Global guidance framework for the responsible use of the life sciences: Mitigating biorisks and governing dual-use research. https://www.who.int/publications/i/item/9789240056107

DNA Animation from LottieFiles: https://lottiefiles.com/free-animation/genetics-iHhxPhbgLp

Week 2 HW: DNA Read, Write, & Edit

Part 1: Benchling & In-silico Gel Art

virtual_digest_sequence_LAMCG (3).png virtual_digest_sequence_LAMCG (3).png

Part 3: DNA Design Challenge

3.1. Choose your protein.

I chose the PETase enzyme protein from the bacterium species Ideonella sakaiensis (strain 201-F6). I chose this protein as it was discovered to be important in plastic degradation. Its plastic degradation capabilities means that it allows bioremediation by reducing plastic pollution and promoting a circular economy.

FASTA sequence of PETase:

sp|A0A0K8P6T7|PETH_PISS1 Poly(ethylene terephthalate) hydrolase OS=Piscinibacter sakaiensis OX=1547922 GN=ISF6_4831 PE=1 SV=1 MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRP SGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQP SSRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAA PQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSRNAKQFLEINGGSHSCA NSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCS

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

Reverse translation of PETase:

Reverse Translate Results for 290 residue sequence “Untitled” starting “MNFPRASRLM”

reverse translation of Untitled to a 870 base sequence of most likely codons. atgaactttccgcgcgcgagccgcctgatgcaggcggcggtgctgggcggcctgatggcg gtgagcgcggcggcgaccgcgcagaccaacccgtatgcgcgcggcccgaacccgaccgcg gcgagcctggaagcgagcgcgggcccgtttaccgtgcgcagctttaccgtgagccgcccg agcggctatggcgcgggcaccgtgtattatccgaccaacgcgggcggcaccgtgggcgcg attgcgattgtgccgggctataccgcgcgccagagcagcattaaatggtggggcccgcgc ctggcgagccatggctttgtggtgattaccattgataccaacagcaccctggatcagccg agcagccgcagcagccagcagatggcggcgctgcgccaggtggcgagcctgaacggcacc agcagcagcccgatttatggcaaagtggataccgcgcgcatgggcgtgatgggctggagc atgggcggcggcggcagcctgattagcgcggcgaacaacccgagcctgaaagcggcggcg ccgcaggcgccgtgggatagcagcaccaactttagcagcgtgaccgtgccgaccctgatt tttgcgtgcgaaaacgatagcattgcgccggtgaacagcagcgcgctgccgatttatgat agcatgagccgcaacgcgaaacagtttctggaaattaacggcggcagccatagctgcgcg aacagcggcaacagcaaccaggcgctgattggcaaaaaaggcgtggcgtggatgaaacgc tttatggataacgatacccgctatagcacctttgcgtgcgaaaacccgaacagcacccgc gtgagcgattttcgcaccgcgaactgcagc

3.3 Codon optimization

Codon optimization is important for speed and efficiency in producing the maximum amount of protein from the cell. Rare codons will not have sufficient tRNA anticodons to match causing the organelle ribosome, responsible for protein synthesis to stall.

If the ribosome stalls, this can lead to the formation of structurally abnormal proteins, losing their proper three dimensional shape, leading to dysfunction in protein activity.

I chose E-coli as its genome is well documented in literature, it grows very fast, and is very frequently used as a host for recombinant DNA technology. Moreover, DNA instructions can be changed to match the bacterial machinery of the E. coli bacteria, allowing researchers and scientists to produce PETase quicker and in higher amounts.

Codon optimised sequence for PETase:

ATGAACTTTCCACGTGCCTCCCGTCTGATGCAGGCAGCTGTGCTGGGTGGCCTGATGGCG
GTTAGCGCCGCAGCAACTGCTCAGACCAATCCGTACGCGCGTGGCCCGAACCCGACTGCC
GCGAGCCTGGAGGCCAGCGCAGGTCCGTTCACCGTACGCAGCTTTACCGTGAGCCGTCCG
AGCGGCTACGGCGCAGGTACCGTGTATTACCCGACCAACGCAGGCGGTACCGTAGGCGCA
ATTGCGATTGTGCCGGGCTATACCGCACGCCAGAGCTCCATCAAGTGGTGGGGCCCGCGT
CTGGCCAGCCACGGTTTCGTAGTGATCACCATCGATACCAACAGCACCCTGGATCAGCCG
AGCTCCCGTAGCTCCCAGCAGATGGCAGCACTGCGTCAGGTGGCATCCCTGAACGGTACC
AGCTCCAGCCCGATCTACGGCAAGGTAGATACCGCACGTATGGGCGTGATGGGTTGGAGC
ATGGGCGGTGGCGGCAGCCTGATCTCCGCAGCAAACAACCCGAGCCTGAAAGCAGCAGCA
CCGCAGGCACCGTGGGATAGCTCCACCAACTTCAGCTCCGTGACCGTGCCGACCCTGATC
TTCGCATGCGAAAACGATAGCATCGCACCGGTGAACAGCTCCGCACTGCCGATCTACGAT
AGCATGAGCCGTAACGCGAAACAGTTCCTGGAAATCAACGGTGGCTCCCACAGCTGCGCC
AACAGCGGCAACAGCAACCAGGCATTGATCGGCAAGAAAGGTGTGGCCTGGATGAAACGT
TTCATGGATAACGATACCCGCTACTCCACCTTCGCATGCGAAAACCCGAACAGCACCCGT
GTGAGCGATTTCCGCACCGCAAACTGCAGC

3.4. You have a sequence! Now what?

What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into your protein. You may describe either cell-dependent or cell-free methods, or both.

Firstly, I would use a cell dependent protein production method like recombinant DNA technology, which would involve inserting optimised DNA into a plasmid using recombinases and ligases to cut and join DNA, the plasmid would be transformed into the E-coli culture, the bacteria will be grown and then we can use IPTG to induce recombinant protein expression in the cells. The cells will be lysed and we will isolate and purify the PETase protein (Schütz et al., 2023).

Another technology I could use is a non cell based protein production method which would involve growing cells in a high concentration, bursting them open using high pressure and then using centrifugation to isolate the protein synthesis machinery such as ribosomes and enzymes. Finally, we can add the optimized DNA to the isolated organelles supernatant, which results in PETase being produced.

Part 4: Prepare a Twist DNA Synthesis Order

https://benchling.com/s/seq-oY3lJJH8RajCqnHCK4RG?m=slm-GAs1mnavmWg4pBOHzVnp

.fasta file for TA review :)

PETase_2 TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGCCATTAAAGAGGAGAAAGGTACCATGAACTTTCCGCGCGCGAGCC GCCTGATGCAGGCGGCGGTGCTGGGCGGCCTGATGGCGGTGAGCGCGGCGGCGACCGCGCAGACCAACCCGTATGCGCG CGGCCCGAACCCGACCGCGGCGAGCCTGGAAGCGAGCGCGGGCCCGTTTACCGTGCGCAGCTTTACCGTGAGCCGCCCG AGCGGCTATGGCGCGGGCACCGTGTATTATCCGACCAACGCGGGCGGCACCGTGGGCGCGATTGCGATTGTGCCGGGCT ATACCGCGCGCCAGAGCAGCATTAAATGGTGGGGCCCGCGCCTGGCGAGCCATGGCTTTGTGGTGATTACCATTGATAC CAACAGCACCCTGGATCAGCCGAGCAGCCGCAGCAGCCAGCAGATGGCGGCGCTGCGCCAGGTGGCGAGCCTGAACGGC ACCAGCAGCAGCCCGATTTATGGCAAAGTGGATACCGCGCGCATGGGCGTGATGGGCTGGAGCATGGGCGGCGGCGGCA GCCTGATTAGCGCGGCGAACAACCCGAGCCTGAAAGCGGCGGCGCCGCAGGCGCCGTGGGATAGCAGCACCAACTTTAG CAGCGTGACCGTGCCGACCCTGATTTTTGCGTGCGAAAACGATAGCATTGCGCCGGTGAACAGCAGCGCGCTGCCGATT TATGATAGCATGAGCCGCAACGCGAAACAGTTTCTGGAAATTAACGGCGGCAGCCATAGCTGCGCGAACAGCGGCAACA GCAACCAGGCGCTGATTGGCAAAAAAGGCGTGGCGTGGATGAAACGCTTTATGGATAACGATACCCGCTATAGCACCTT TGCGTGCGAAAACCCGAACAGCACCCGCGTGAGCGATTTTCGCACCGCGAACTGCAGCCATCACCATCACCATCATCAC TAACCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGC TCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATAATA

4.6. Choose Your Vector

https://benchling.com/s/seq-qZFBnXDT1X5Wyl0Mw1YJ?m=slm-FoxAu2Oeo4nJoQJoTZMP

Part 5: DNA Read/Write/Edit

5.1 DNA Read

(i) What DNA would you want to sequence (e.g., read) and why? This could be DNA related to human health (e.g. genes related to disease research), environmental monitoring (e.g., sewage waste water, biodiversity analysis), and beyond (e.g. DNA data storage, biobank).

I would want to sequence the primary surface protein gene (the spike protein which determines immunogenicity) of common virus, such as a localized variant of norovirus.

Why I would sequence it:

If I sequence the major capsid protein, VP1, which is encoded by the Open Reading Frame 2 (ORF2) gene from environmental samples (such as water bodies), we can find the virus in a city before the virus transmits. This acts as an early warning system.

Viruses mutate quickly therefore sequencing allows us to monitor virus mutation evolution over time and see if the virus is becoming more communicable or if it has developed resistance to antiviral medications.

For designing a vaccine or treatment, scientists require the genetic code of rapidly mutating viral surface protein which trigger the immune response. Sequencing will let us know which amino acids the virus uses for the construction of a protein to make a replica for triggering the immune response.

Sequencing can identify different variants; for example a normal strain from a mutated strain, therefore we detect changes quickly for early treatment of patients.

(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Also answer the following questions:

Is your method first-, second- or third-generation or other? How so? What is your input? How do you prepare your input (e.g. fragmentation, adapter ligation, PCR)? List the essential steps. What are the essential steps of your chosen sequencing technology, how does it decode the bases of your DNA sample (base calling)? What is the output of your chosen sequencing technology?

Nanopore Sequencing, which is a third-generation method, would be the technology I would choose because it can sequence extremely long DNA fragments and as data is analyzed instantly as it is passed into the pore, giving actionable insights for mutations occurring in viruses and pathogen surveillance. It also does not require PCR to amplify copies of DNA rapidly as it can read simultaneous molecular signals (MacKenzie and Argyropoulos, 2023).

My input will be genomic DNA or RNA taken from a viral sample.

Preparation of input: DNA is fragmented into the lengths required, but Nanopore sequencing can handle long, unfragmented reads.

The ends of the DNA will be repaired to make them blunt.

Adapter ligation will allow adaptors on the ends of the DNA to pull the DNA into the nanopore.

A tether molecule is added to help DNA find the pore on the sensor.

Nanopore sequencing is different in that it uses electricity instead of chemicals or light to detect bases.

The DNA is passed into a nanopore embedded in an electrically resistant membrane, and a persistent electrical current is passed through the pore. As the DNA strand moves through the pore and its data analysed, the bases A, T, C, G block the opening in a different way.

Each base has a different shape and size, it causes a unique disruption in the electrical current. Deep Learning can look at these electrical shifts throughout time to determine the bases sequence.

The output is an electrical signal file, commonly in .fast5 or .pod5 format.

The processed output of the squiggles is a FASTQ file. It contains the base sequences (A, T, C, G) and it will explain how confident the machine was about each base it read (MacKenzie and Argyropoulos, 2023).

5.2 DNA Write

(i) What DNA would you want to synthesize (e.g., write) and why? These could be individual genes, clusters of genes or genetic circuits, whole genomes, and beyond. As described in class thus far, applications could range from therapeutics and drug discovery (e.g., mRNA vaccines and therapies) to novel biomaterials (e.g. structural proteins), to sensors (e.g., genetic circuits for sensing and responding to inflammation, environmental stimuli, etc.), to art (DNA origamis). If possible, include the specific genetic sequence(s) of what you would like to synthesize! You will have the opportunity to actually have Twist synthesize these DNA constructs! :)

I would like to synthesise the DNA for the Combinatorial Nanobody Display Library (VHH domains) as in drug discovery, antibodies have unparalleled specificity, high binding strength to their target, and can modulate disease-related proteins, which positions them as directed, efficacious therapies in cancer medicine, immunology and infectious diseases. Viruses have antigenic markers which are distinct, localised parts of a foreign molecule which are directly recognised and bound by the immune system, specifically by antibodies, B cells or T cells of the immune system. Antibodies are, however large in size and difficult to reach these epitopes to neutralise the viral particle/cancer cell. This is where nanobodies which are small in size, can be highly resistant to heat, extreme pH and protein cleaving enzymes, can travel to concealed epitopes on viruses or cancer cells that larger human antibodies cannot reach (Muyldermans, 2021).

(ii) What technology or technologies would you use to perform this DNA synthesis and why? Also answer the following questions:

What are the essential steps of your chosen sequencing methods? What are the limitations of your sequencing method (if any) in terms of speed, accuracy, scalability?

Answer: The technologies I plan to use are Trinucleotide-Directed Mutagenesis to produce the diversity of the antibody by creating mutations in the Complementarity-Determining Regions whilst avoiding Termination Codons and Frameshift variants.

We will use PCR to generate diversity in the CDR3 region, a very diverse and flexible region, which touches the virus epitope, by creating different random combinations to reshuffle the DNA, and to create lots of nanobody genes.

Phages will be utilised to display the DNA on the surface, therefore we have the disease/viral/infectious proteins which will be mixed with the phages and phages exhibiting specificity and affinity for that protein will remain.

We will then use Next Generation sequencing and artificial intelligence to look at the left over phages and find better versions iteratively.

Essential steps:

  • We will fragment the nanobody, breaking it into its constituent parts: CDR1, CDR2, CDR3 and the framework regions.
  • We will do randomisation of the CDR3 part, using PCR we will use primers to insert random DNA bases at antigen binding sites.
  • The DNA parts have overlapping ends, so we stitch fragments together in the correct way.
  • We use DNA polymerase enzyme for DNA Amplification, creating a complete, double-stranded nanobody gene.
  • The DNA sequence must be recognised for the organism it grows in, so codon optimisation for the host organism is important.
  • We create mutations in select regions where we don’t create stop codons or frameshift mutations (Webmaster, 2024)

The Inputs:

DNA template: VHH gene.

Primers: Short, bespoke DNA strands. Some will match the VHH gene scaffold, whilst others will contain random mutations.

Enzymes: DNA Polymerase for building DNA and Restriction Enzymes to cut the DNA for plasmid insertion.

Plasmid vectors: the plasmid vector will carry the new library into cells.

E. coli cells: will help to replicate the nanobody gene library.

  1. Limitations of the Method
    1. E coli transformation efficiency - how much VHH DNA can be transfected.

    2. Bias of primers - we may get less diverse randomised DNA as the same bases can be chosen again and again.

    3. PCR DNA amplification may introduce errors whilst DNA is being replicated.

    4. Randomising DNA may introduce stop codons, therefore if they occur in the middle of the nanobody, the cell can stop halfway through, and we can end up with a useless fragment of a key.

5.3 DNA Edit

(i) What DNA would you want to edit and why?

In class, George shared a variety of ways to edit the genes and genomes of humans and other organisms. Such DNA editing technologies have profound implications for human health, development, and even human longevity and human augmentation. DNA editing is also already commonly leveraged for flora and fauna, for example in nature conservation efforts, (animal/plant restoration, de-extinction), or in agriculture (e.g. plant breeding, nitrogen fixation). What kinds of edits might you want to make to DNA (e.g., human genomes and beyond) and why?

I would choose to edit out the highly conserved long terminal repeats which act as a promoter and regulatory sequences for viral transcription and replication and surround HIV’s viral genome which integrates into the host T CD4 helper cells, which help the adaptive arm of the immune system to recognize and destroy infected or cancerous cells, control immune responses and provide long-lasting immunity (Hu & Hughes, 2012). HIV is a viral disease that affects millions of people in all countries, and antiretroviral therapy and the immune system cannot easily detect the virus as it remains dormant in the host T cells. Furthermore, antiretroviral resistance may also occur over time, and strict medication adherence for patients must be followed to prevent HIV viral particle replication cycles (Hu and Hughes, 2012).

(ii) What technology or technologies would you use to perform these DNA edits and why?

I would use the CRISPR-Cas 9 gene editing system to edit out the Long Terminal Repeats which are identical and remove HIV from the CD4 T cells. They will stop the dormant, latent virus reservoir from growing and we can use one sgRNA to target these highly conserved LTR regions (Hu & Hughes, 2012), (Wang et al., 2018).

Also answer the following questions:

How does your technology of choice edit DNA? What are the essential steps?

CRISPR-Cas9 works as follows:

1 - We can make sgRNAs with a complementary base pair match to the LTR target sequence (at the 5’ and 3’ LTR ends of the integrated HIV genome) 2 - The sgRNAs will direct the Cas9 nuclease, which binds with guide RNA (gRNA), which directs it to particular DNA sequences within the HIV-1 LTR, producing double-strand breaks which leads to step 3. 3 - We will introduce indel mutations which will trigger the cell’s repair mechanisms (Non homologous end joining) which will inactivate the virus and disrupt viral transcription at the promoter region. This in turn prevents HIV viral replication (Asmamaw & Zawdie, 2021).

What preparation do you need to do (e.g. design steps) and what is the input (e.g. DNA template, enzymes, plasmids, primers, guides, cells) for the editing?

  1. Preparation and design.

Firstly, we must find a conserved LTR region across different HIV strains to target with our sgRNAs and prevent host self DNA targeting and cleavage.

Secondly, the LTR target sequence region can mutate rapidly in HIV, therefore we should use a database with known mutant sequences and design our sgRNA for different mutant strains.

Thirdly, we will perform off target screening, to prevent damage to host self DNA, we must make sure the guide sequence does not match with a human gene (particularly near the PAM, which is important for the Cas9 nuclease to cleave the target DNA and is found 3-4 nucleotides downstream from the cut site).

We have to look into the LTR region which is composed of three subsegments, and find important areas important for viral transcription and replication therefore survival. Previous research has shown the U3, R and U5 regions, NF-κB binding sites (an important transcription factor which regulates cell growth and plays a role in cell development), the TATA box (highly conserved region found in the promoter region, and is a binding site for the TATA-binding protein, initiating transcription) and TAR (used with Tar protein to start viral replication) may be important (Wang et al., 2018).

We then must look for a PAM region.

Hairpin structures may prevent Cas nuclease binding to the sgRNA, therefore we can use insilico tools like Mfold to check if our sgRNA folds on itself (Asmamaw & Zawdie, 2021).

Physical inputs including ordering/creating components such as:

  • A custom sgRNA oligos with a 20-nt sequence to a vendor who can create a RNA / DNA template for it.
  • We will need a plasmid when using a cell culture (pLentiCRISPR v2) to put the sequence into the scaffold which carries the Cas9 nuclease and sgRNA complex.
  • I will make a pair of Forward and Reverse primers which be positioned outside of the target site.
  • After editing, we will do PCR. The primers will be placed before the 5’ LTR and one after the 3’ LTR. PCR will amplify millions of copies of solely the DNA between those two primers.
  • We will use gel electrophoresis and if the DNA travels right to the top it means this will be our unedited HIV DNA control and if successful our new band for our edited genome will be positioned much lower down on the gel (Asmamaw & Zawdie, 2021).
  1. Efficiency versus Precision

Efficiency: In a lab environment, results of CRISPR-Cas9 editing may be successful in cell culture, compared to the complexity of the living body, getting the Cas9 enzyme to cleave LTRs in the brain or bone marrow is still in its infancy due to its potential to induce off target cleavage and DNA damage leading to dangerous consequences.

Precision: Using an engineered Cas9 nuclease with an increased DNA targeting efficiency and minimal non-target toxicity whilst editing and which will only cut if the match is 100% perfect, significantly reduces host DNA damage risks in the human genome.

Since we have to target the 5’ and 3’ regions, it important to check that the guide sequence is present at both ends in order to excise the integrated HIV genome in the host CD4 T cells or other immune cells like macrophages (Hunt et al., 2023).

What are the limitations of your editing methods (if any) in terms of efficiency or precision?

Mutations may occur near the 3’ region in the seed region which is important for Cas9 nuclease recognition and binding to the target DNA, which could prevent Cas9/gRNA complex driven cleavage and suppression of viral replication.

References

Asmamaw, M., & Zawdie, B. (2021). Mechanism and applications of crispr/cas-9-mediated genome editing. Biologics : Targets & Therapy, 15, 353–361. https://doi.org/10.2147/BTT.S326422

Wang, G., Zhao, N., Berkhout, B., & Das, A. T. (2018). CRISPR-Cas based antiviral strategies against HIV-1. Virus Research, 244, 321–332. https://doi.org/10.1016/j.virusres.2017.07.020

Muyldermans, S. (2021). A guide to: Generation and design of nanobodies. The Febs Journal, 288(7), 2084–2102. https://doi.org/10.1111/febs.15515

Webmaster, I. (2024, November 18). Synthetic biology technologies for antibody discovery. Isogenica. https://isogenica.com/synthetic-biology-technologies-for-antibody-discovery/

MacKenzie, M., & Argyropoulos, C. (2023). An introduction to nanopore sequencing: Past, present, and future considerations. Micromachines, 14(2), 459. https://doi.org/10.3390/mi14020459

Hu, W.-S., & Hughes, S. H. (2012). Hiv-1 reverse transcription. Cold Spring Harbor Perspectives in Medicine, 2(10), a006882. https://doi.org/10.1101/cshperspect.a006882

Schütz, A., Bernhard, F., Berrow, N., Buyel, J. F., Ferreira-da-Silva, F., Haustraete, J., van den Heuvel, J., Hoffmann, J.-E., de Marco, A., Peleg, Y., Suppmann, S., Unger, T., Vanhoucke, M., Witt, S., & Remans, K. (2023). A concise guide to choosing suitable gene expression systems for recombinant protein production. STAR Protocols, 4(4), 102572. https://doi.org/10.1016/j.xpro.2023.102572

Hunt, J. M. T., Samson, C. A., Rand, A. du, & Sheppard, H. M. (2023). Unintended CRISPR-Cas9 editing outcomes: A review of the detection and prevalence of structural variants generated by gene-editing in human cells. Human Genetics, 142(6), 705–720. https://doi.org/10.1007/s00439-023-02561-1

Week 3 HW: Lab Automation

Assignment: Python Script for Opentrons Artwork — DUE BY YOUR LAB TIME!

A Blooming Daisy Flower PINK, PURPLE & BLUE DESIGN! :) image.png image.png

INITIAL DESIGN: image.png image.png

Python documentation

from opentrons import types

metadata = {
    'author': 'Tammy Sisodiya',
    'protocolName': ' HTGAA Dazzling Daisy',
    'description': 'A blooming Daisy flower in Purple, Pink, and Blue.',
    'source': 'HTGAA 2026 Opentrons Lab',
    'apiLevel': '2.20'
}

##############################################################################
###   Robot deck setup constants
##############################################################################

TIP_RACK_DECK_SLOT = 9
COLORS_DECK_SLOT = 6
AGAR_DECK_SLOT = 5
PIPETTE_STARTING_TIP_WELL = 'A1'

# UPDATED: Mapping the new lab colors to source wells
well_colors = {
    'A1' : 'Purple',
    'B1' : 'Pink',
    'C1' : 'Blue'
}

def run(protocol):
  # Tips
  tips_20ul = protocol.load_labware('opentrons_96_tiprack_20ul', TIP_RACK_DECK_SLOT, 'Opentrons 20uL Tips')

  # Pipettes
  pipette_20ul = protocol.load_instrument("p20_single_gen2", "right", [tips_20ul])

  # Modules
  temperature_module = protocol.load_module('temperature module gen2', COLORS_DECK_SLOT)
  temperature_plate = temperature_module.load_labware('opentrons_96_aluminumblock_generic_pcr_strip_200ul', 'Cold Plate')
  color_plate = temperature_plate

  # Agar Plate
  agar_plate = protocol.load_labware('htgaa_agar_plate', AGAR_DECK_SLOT, 'Agar Plate')
  center_location = agar_plate['A1'].top()
  pipette_20ul.starting_tip = tips_20ul.well(PIPETTE_STARTING_TIP_WELL)

  # Helper Functions
  def location_of_color(color_string):
    for well,color in well_colors.items():
      if color.lower() == color_string.lower():
        return color_plate[well]
    raise ValueError(f"No well found with color {color_string}")

  def dispense_and_detach(pipette, volume, location):
      assert(isinstance(volume, (int, float)))
      above_location = location.move(types.Point(z=location.point.z + 5))
      pipette.move_to(above_location)
      pipette.dispense(volume, location)
      pipette.move_to(above_location)

### YOUR DESIGN DATA ###
  sfgfp_points = [(-4.4, 26.4),(1.1, 26.4),(2.2, 26.4),(3.3, 26.4),(4.4, 26.4),(-6.6, 25.3),(-5.5, 25.3),(-4.4, 25.3),(-3.3, 25.3),(1.1, 25.3),(4.4, 25.3),(-12.1, 24.2),(-11, 24.2),(-9.9, 24.2),(-8.8, 24.2),(-7.7, 24.2),(-6.6, 24.2),(-2.2, 24.2),(0, 24.2),(4.4, 24.2),(-13.2, 23.1),(-12.1, 23.1),(-7.7, 23.1),(-6.6, 23.1),(-2.2, 23.1),(4.4, 23.1),(6.6, 23.1),(7.7, 23.1),(8.8, 23.1),(-13.2, 22),(-7.7, 22),(-6.6, 22),(-2.2, 22),(0, 22),(4.4, 22),(5.5, 22),(6.6, 22),(8.8, 22),(-13.2, 20.9),(-7.7, 20.9),(-2.2, 20.9),(3.3, 20.9),(4.4, 20.9),(9.9, 20.9),(-13.2, 19.8),(-7.7, 19.8),(-2.2, 19.8),(3.3, 19.8),(9.9, 19.8),(-13.2, 18.7),(-2.2, 18.7),(8.8, 18.7),(9.9, 18.7),(-13.2, 17.6),(-2.2, 17.6),(-1.1, 17.6),(8.8, 17.6),(-18.7, 16.5),(-17.6, 16.5),(-16.5, 16.5),(-15.4, 16.5),(-14.3, 16.5),(-12.1, 16.5),(-2.2, 16.5),(7.7, 16.5),(8.8, 16.5),(-20.9, 15.4),(-19.8, 15.4),(-18.7, 15.4),(-14.3, 15.4),(-13.2, 15.4),(-12.1, 15.4),(-2.2, 15.4),(-1.1, 15.4),(7.7, 15.4),(-20.9, 14.3),(-13.2, 14.3),(-12.1, 14.3),(-11, 14.3),(-3.3, 14.3),(-2.2, 14.3),(5.5, 14.3),(6.6, 14.3),(-20.9, 13.2),(-9.9, 13.2),(-8.8, 13.2),(-3.3, 13.2),(4.4, 13.2),(5.5, 13.2),(-20.9, 12.1),(-8.8, 12.1),(-7.7, 12.1),(-3.3, 12.1),(-2.2, 12.1),(2.2, 12.1),(3.3, 12.1),(5.5, 12.1),(6.6, 12.1),(7.7, 12.1),(8.8, 12.1),(9.9, 12.1),(11, 12.1),(-20.9, 11),(-19.8, 11),(-6.6, 11),(-5.5, 11),(-4.4, 11),(0, 11),(1.1, 11),(2.2, 11),(3.3, 11),(4.4, 11),(11, 11),(-19.8, 9.9),(-4.4, 9.9),(-3.3, 9.9),(-2.2, 9.9),(-1.1, 9.9),(0, 9.9),(11, 9.9),(-23.1, 8.8),(-22, 8.8),(-20.9, 8.8),(-19.8, 8.8),(-18.7, 8.8),(-17.6, 8.8),(-5.5, 8.8),(-4.4, 8.8),(-3.3, 8.8),(-2.2, 8.8),(-1.1, 8.8),(0, 8.8),(9.9, 8.8),(11, 8.8),(15.4, 8.8),(16.5, 8.8),(17.6, 8.8),(18.7, 8.8),(19.8, 8.8),(20.9, 8.8),(22, 8.8),(23.1, 8.8),(24.2, 8.8),(-23.1, 7.7),(-3.3, 7.7),(-1.1, 7.7),(0, 7.7),(1.1, 7.7),(2.2, 7.7),(8.8, 7.7),(9.9, 7.7),(14.3, 7.7),(15.4, 7.7),(16.5, 7.7),(20.9, 7.7),(22, 7.7),(24.2, 7.7),(-24.2, 6.6),(-23.1, 6.6),(-5.5, 6.6),(-4.4, 6.6),(-3.3, 6.6),(-2.2, 6.6),(1.1, 6.6),(2.2, 6.6),(7.7, 6.6),(8.8, 6.6),(9.9, 6.6),(11, 6.6),(12.1, 6.6),(13.2, 6.6),(14.3, 6.6),(16.5, 6.6),(19.8, 6.6),(20.9, 6.6),(23.1, 6.6),(-22, 5.5),(-9.9, 5.5),(-7.7, 5.5),(-6.6, 5.5),(-5.5, 5.5),(-4.4, 5.5),(-2.2, 5.5),(3.3, 5.5),(4.4, 5.5),(13.2, 5.5),(16.5, 5.5),(18.7, 5.5),(19.8, 5.5),(23.1, 5.5),(-20.9, 4.4),(-19.8, 4.4),(-18.7, 4.4),(-17.6, 4.4),(-16.5, 4.4),(-15.4, 4.4),(-14.3, 4.4),(-13.2, 4.4),(-12.1, 4.4),(-9.9, 4.4),(-8.8, 4.4),(-7.7, 4.4),(-2.2, 4.4),(4.4, 4.4),(5.5, 4.4),(6.6, 4.4),(13.2, 4.4),(16.5, 4.4),(17.6, 4.4),(18.7, 4.4),(23.1, 4.4),(-12.1, 3.3),(-11, 3.3),(-2.2, 3.3),(5.5, 3.3),(7.7, 3.3),(8.8, 3.3),(9.9, 3.3),(11, 3.3),(12.1, 3.3),(13.2, 3.3),(16.5, 3.3),(17.6, 3.3),(23.1, 3.3),(-13.2, 2.2),(-12.1, 2.2),(-2.2, 2.2),(6.6, 2.2),(9.9, 2.2),(16.5, 2.2),(17.6, 2.2),(18.7, 2.2),(19.8, 2.2),(20.9, 2.2),(22, 2.2),(-14.3, 1.1),(-2.2, 1.1),(7.7, 1.1),(9.9, 1.1),(11, 1.1),(15.4, 1.1),(16.5, 1.1),(22, 1.1),(-15.4, 0),(-14.3, 0),(-2.2, 0),(7.7, 0),(11, 0),(14.3, 0),(15.4, 0),(22, 0),(-15.4, -1.1),(-2.2, -1.1),(3.3, -1.1),(8.8, -1.1),(13.2, -1.1),(14.3, -1.1),(20.9, -1.1),(-15.4, -2.2),(-14.3, -2.2),(-13.2, -2.2),(-12.1, -2.2),(-8.8, -2.2),(-7.7, -2.2),(-6.6, -2.2),(-2.2, -2.2),(3.3, -2.2),(8.8, -2.2),(11, -2.2),(12.1, -2.2),(13.2, -2.2),(20.9, -2.2),(-14.3, -3.3),(-11, -3.3),(-8.8, -3.3),(-7.7, -3.3),(-6.6, -3.3),(-3.3, -3.3),(-2.2, -3.3),(3.3, -3.3),(4.4, -3.3),(8.8, -3.3),(11, -3.3),(19.8, -3.3),(20.9, -3.3),(-13.2, -4.4),(-12.1, -4.4),(-11, -4.4),(-8.8, -4.4),(-3.3, -4.4),(-2.2, -4.4),(4.4, -4.4),(5.5, -4.4),(6.6, -4.4),(7.7, -4.4),(8.8, -4.4),(13.2, -4.4),(19.8, -4.4),(-16.5, -5.5),(-15.4, -5.5),(-14.3, -5.5),(-13.2, -5.5),(-8.8, -5.5),(-4.4, -5.5),(-3.3, -5.5),(-2.2, -5.5),(4.4, -5.5),(14.3, -5.5),(15.4, -5.5),(16.5, -5.5),(18.7, -5.5),(19.8, -5.5),(-19.8, -6.6),(-18.7, -6.6),(-17.6, -6.6),(-16.5, -6.6),(-8.8, -6.6),(-4.4, -6.6),(-1.1, -6.6),(3.3, -6.6),(4.4, -6.6),(17.6, -6.6),(18.7, -6.6),(-23.1, -7.7),(-22, -7.7),(-20.9, -7.7),(-19.8, -7.7),(-17.6, -7.7),(-16.5, -7.7),(-15.4, -7.7),(-8.8, -7.7),(-7.7, -7.7),(-6.6, -7.7),(-5.5, -7.7),(-4.4, -7.7),(-3.3, -7.7),(-2.2, -7.7),(-1.1, -7.7),(0, -7.7),(2.2, -7.7),(3.3, -7.7),(16.5, -7.7),(17.6, -7.7),(-24.2, -8.8),(-23.1, -8.8),(-14.3, -8.8),(-13.2, -8.8),(-8.8, -8.8),(-7.7, -8.8),(-3.3, -8.8),(-2.2, -8.8),(0, -8.8),(1.1, -8.8),(2.2, -8.8),(3.3, -8.8),(5.5, -8.8),(14.3, -8.8),(15.4, -8.8),(16.5, -8.8),(-26.4, -9.9),(-25.3, -9.9),(-24.2, -9.9),(-12.1, -9.9),(-11, -9.9),(-9.9, -9.9),(-8.8, -9.9),(-3.3, -9.9),(0, -9.9),(7.7, -9.9),(8.8, -9.9),(11, -9.9),(12.1, -9.9),(13.2, -9.9),(-27.5, -11),(-26.4, -11),(-25.3, -11),(-24.2, -11),(-23.1, -11),(-22, -11),(-20.9, -11),(-19.8, -11),(-18.7, -11),(-17.6, -11),(-16.5, -11),(-15.4, -11),(-14.3, -11),(-13.2, -11),(-12.1, -11),(-11, -11),(-3.3, -11),(0, -11),(-28.6, -12.1),(-27.5, -12.1),(-19.8, -12.1),(-18.7, -12.1),(-17.6, -12.1),(-15.4, -12.1),(-12.1, -12.1),(-4.4, -12.1),(-3.3, -12.1),(-2.2, -12.1),(0, -12.1),(-28.6, -13.2),(-27.5, -13.2),(-20.9, -13.2),(-19.8, -13.2),(-12.1, -13.2),(-4.4, -13.2),(0, -13.2),(-26.4, -14.3),(-25.3, -14.3),(-13.2, -14.3),(-12.1, -14.3),(-5.5, -14.3),(-2.2, -14.3),(0, -14.3),(-23.1, -15.4),(-20.9, -15.4),(-19.8, -15.4),(-13.2, -15.4),(-8.8, -15.4),(-7.7, -15.4),(-6.6, -15.4),(-1.1, -15.4),(0, -15.4),(-18.7, -16.5),(-16.5, -16.5),(-15.4, -16.5),(-14.3, -16.5),(-13.2, -16.5),(-12.1, -16.5),(-9.9, -16.5),(-8.8, -16.5),(-2.2, -16.5),(0, -16.5),(-2.2, -17.6),(0, -17.6),(0, -18.7),(-1.1, -19.8),(1.1, -19.8),(-1.1, -20.9),(1.1, -20.9),(2.2, -20.9),(0, -22),(3.3, -22),(12.1, -22),(13.2, -22),(14.3, -22),(15.4, -22),(0, -23.1),(4.4, -23.1),(5.5, -23.1),(9.9, -23.1),(11, -23.1),(12.1, -23.1),(13.2, -23.1),(1.1, -24.2),(2.2, -24.2),(5.5, -24.2),(6.6, -24.2),(7.7, -24.2),(8.8, -24.2),(9.9, -24.2),(11, -24.2),(12.1, -24.2),(2.2, -25.3),(3.3, -25.3),(4.4, -25.3),(9.9, -25.3),(11, -25.3),(5.5, -26.4),(6.6, -26.4),(7.7, -26.4),(8.8, -26.4)]
  mrfp1_points = [(-15.4, 12.1),(-14.3, 12.1),(-14.3, 11),(-13.2, 11),(-12.1, 11)]
  mscarlet_i_points = [(-11, 20.9),(-9.9, 20.9),(-11, 19.8),(-9.9, 19.8),(-9.9, 18.7)]
  mko2_points = [(3.3, 18.7),(4.4, 18.7),(5.5, 18.7),(6.6, 18.7),(4.4, 17.6)]
  mjuniper_points = [(6.6, 9.9),(7.7, 9.9),(4.4, 8.8),(5.5, 8.8),(6.6, 8.8),(7.7, 8.8),(-6.6, 2.2),(-9.9, 1.1),(-8.8, 1.1),(-7.7, 1.1),(-6.6, 1.1)]
  electra2_points = [(1.1, 4.4),(1.1, 3.3),(1.1, 2.2),(1.1, 1.1)]

  # 2. UPDATED Design Mapping
  # Purple for the large petals, Pink for highlights, Blue for details.
  layers = [
      ('Purple', sfgfp_points),
      ('Pink', mrfp1_points),
      ('Pink', mscarlet_i_points),
      ('Pink', mko2_points),
      ('Blue', mjuniper_points),
      ('Blue', electra2_points)
  ]

  # 3. Execution Loop
  drop_vol = 1.0

  for color_name, points in layers:
      if not points:
          continue

      source_well = location_of_color(color_name)

      for i in range(0, len(points), 15):
          chunk = points[i:i + 15]
          pipette_20ul.pick_up_tip()

          aspirate_vol = (len(chunk) * drop_vol) + 2.0
          if aspirate_vol > 20.0:
              aspirate_vol = 20.0

          pipette_20ul.aspirate(aspirate_vol, source_well)

          for x, y in chunk:
              if (x**2 + y**2) < 1600:
                  target_point = center_location.point + types.Point(x=x, y=y, z=0)
                  target_loc = types.Location(target_point, None)
                  dispense_and_detach(pipette_20ul, drop_vol, target_loc)

          # Return residual to source well top to avoid contamination
          if pipette_20ul.current_volume > 0:
              pipette_20ul.dispense(pipette_20ul.current_volume, source_well.top())

          pipette_20ul.drop_tip()

Post-Lab Questions — DUE BY START OF FEB 24 LECTURE

  1. I found a paper which utilises Opentrons-2 liquid handling to mix and set up protein crystallization plates using Hen Egg White Lysozyme (HEWL) as a Model System and validation of the robot’s capabilities and Periplasmic Protein (which is used as a framework/scaffold for large batches of similarly structured crystals). Protein crystallization aids in 3D structural determination of proteins using X-ray crystallography. If we know the protein-protein interactions and protein function details, we can use this to drive the design, modelling and optimization of new drugs which fit in the protein’s active site. The paper showed how researchers automated the 24-well sitting drop plates setup for these two specific proteins for protein crystallization, and compared to manual human pipetting of plates, the Opentrons robot decreased manual labour, increased reliability of pipetting, but potentially decreased variability across plates from person to person (DeRoo et al., 2025).

2) Write a description about what you intend to do with automation tools for your final project. You may include example pseudocode, Python scripts, 3D printed holders, a plan for how to use Ginkgo Nebula, and more. You may reference this week’s recitation slide deck for lab automation details.

For my idea 1: The Opentrons protocol can be used in a 96-well plate to mix different PETase enzyme mutants / variations of this enzyme with 5 different plastic types.

A random idea I thought of initially as a small side project as part of my original idea 1: Measure PETase degradation rate

I would like to check for stable PETase variants, by using Ginkgo Nebula to design a library of between 50-100 PETase mutants, specifically targeting the W159H/S238F double mutation which has been shown to improve stabilisation and binding affinity to PET in Ideonella sakaiensis PETase (Austin et al., 2018).

The steps I will take to get them sequenced and produced by the foundry: 1) ensuring codon optimisation for the DNA sequence variants and 2) get 96-well plates or expression ready plasmids.

I will hope to develop 3D printed lab equipment, such as a PETase film clamping insert, which will allow PETase films to not float and keep films submerged, so the Opentrons pipette will deposit the enzyme directly on the surface.

I would then like to automate the measurement of how fast PETase degrades plastic - optimising the conditions for PETase activity, as it is dependent on temperature, pH and NaCl concentration. We can use a multiplexed bioreactor array with precise control over pH, temperature, oxygen transfer, and nutrient feeding in its chambers, to allow lots of parallel experiments (cell cultures) to be conducted all at the same time. I would like to use hardware such as Arduino / ESP32 to administer 4–8 small vials, each with its own heating element and pH probe. We use a loop to keep the pH continuous: When PETase breaks down PET into its monomers; terephthalic acid and ethylene glycol, the pH will become more acidic. The automation system will measure how much alkalinity like adding NaOH it has to pump in to neutralize the acid. The quantity of the alkaline base added will become a substitute for the plastic degradation rate, producing an output of a graph.

For my idea 2: Protein MPNN generated phage tail sequences will be stored in Benchling and annotated, top 5 highly ranked phage tail sequences will be run into Nebula’s codon optimisation tool for bacteria. To test engineered phages against non target and target bacteria, I’d use a phage spotting assay, for which I’d utilise Opentrons protocol to automate this.

For my idea 3: Creating a GFP-based cancer mutation biosensor, I will use Benchling to create the genetic construct (GFP-antibody fragment/nanobody fusion, promoter, linker sequences), Ginkgo Nebula to codon optimise and synthesise the sequence and an Opentrons protocol which will automate the 96-well plate fluorescence assay, which I will use mutant and non-mutant KRAS genes for exposing to the GFP biosensor.

Final Project Ideas — DUE BY START OF FEB 24 LECTURE

BTS: Inspirations, content and why I chose these 3 project ideas?

My inspiration for the first idea, the PETase substrate specificity in silico project idea was to understand how different plastics apart from PET could be degraded by the enzyme, and how inducing mutations in the stabilisation sites of the enzyme may increase binding affinity for the plastic. Generally, I am very interested in the topic of bioremediation for plastic pollution, which is widespread.

My inspiration for the second idea, was that phages use receptor binding proteins to bind to bacteria, and as they have a limited host range due to the high receptor specificity for specific bacterial organelles, we can engineer the receptor binding loops in the RBP region with swapping similar genes between phages, and through insertions, mutations and recombining genes to change binding affinity to different bacterial surface proteins and broaden host phage range.

My inspiration for the third idea was brought about my initial meeting at at Lifefabs using phytoplanktons as water quality biosensors joined with the idea of learning at the BioBootcamp and in HTGAA classes of GFP as glowing if mutations are detected which would be a biological readout of an active molecular detector to prevent the development of cancer downstream.

References

DeRoo, J. B., Jones, A. A., Slaughter, C. K., Ahr, T. W., Stroup, S. M., Thompson, G. B., & Snow, C. D. (2025). Automation of protein crystallization scaleup via Opentrons-2 liquid handling. SLAS Technology, 32, 100268. https://doi.org/10.1016/j.slast.2025.100268

Austin, H. P., Allen, M. D., Donohoe, B. S., Rorrer, N. A., Kearns, F. L., Silveira, R. L., Pollard, B. C., Dominick, G., Duman, R., El Omari, K., Mykhaylyk, V., Wagner, A., Michener, W. E., Amore, A., Skaf, M. S., Crowley, M. F., Thorne, A. W., Johnson, C. W., Woodcock, H. L., … Beckham, G. T. (2018). Characterization and engineering of a plastic-degrading aromatic polyesterase. Proceedings of the National Academy of Sciences, 115(19). https://doi.org/10.1073/pnas.1718804115

Week 4 HW: Protein Design Part I

Part B: Protein Analysis and Visualization

In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins.

1. Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions. Briefly describe the protein you selected and why you selected it.

I chose the p53 protein, which triggers programmed cell death when ailments like cancer cause extensive DNA damage from oxidative stress like UV light, oxygen radicals or chemicals. In a cancerous cell, the p53 protein will travel to the nucleus and signal the mitochondria to release reactive oxygen species or increase calcium levels. Other death factors released include cytochrome c, which activates caspases and SMAC which blocks survival proteins (Fogg et al., 2011). I selected this protein as mutations in this protein can cause cancer and it is vital to protect the human genome from damage .

  1. Identify the amino acid sequence of your protein. How long is it? What is the most frequent amino acid? You can use this notebook to count most frequent amino acid - https://colab.research.google.com/drive/1vlAU_Y84lb04e4Nnaf1axU8nQA6_QBP1?usp=sharing

p53 is 393 amino acids long.

The most common amino acid is: P (Proline), which appears 45 times.

How many protein sequence homologs are there for your protein? Hint: Use the pBLAST tool to search for homologs and ClustalOmega to align and visualize them.

I found 175 total homologs using the p53 human version (https://www.uniprot.org/uniprotkb/P04637/entry).

My Cluster alignment sequences are below:

CLUSTAL O(1.2.4) multiple sequence alignment


Zebrafish_P53      -------------MAQNDSQEFAELWEKN----LISIQPPGGGSCWDII-----NDEEYL	38
Frog_P53           ----MEPSSETGMDPPLSQETFEDLWSLLPDPLQTVTCR-------------LDNLSEFP	43
Human_P53          ---MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLP---SQAMDDLMLSPDDIEQWF	54
Mouse_P53          MTAMEESQSDISLELPLSQETFSGLWKLLPPEDILPS-----PHCMDDLLL-PQDVEEFF	54
Rat_P53            ---MEDSQSDMSIELPLSQETFSCLWKLLPPDDILPTTATGSPNPMEDLFL-PQDVAELL	56
                                    ..: *  **.                           :  :  

Zebrafish_P53      ---PGSFDPNFFG-NV-----LEEQP------QPSTLPPTSTVPETSDYPGDHGFRLRFP	83
Frog_P53           D-YPLAADMTV------LQ--------EGLMGNAVPTVTSCAVPSTDDYAGKYGLQLDFQ	88
Human_P53          TEDPGPDEAPRMPEAAPPVAPAPATPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFL	114
Mouse_P53          E---GPSEALRVSGAPAAQDPVTETPGPVAPAPATPWPLSSFVPSQKTYQGNYGFHLGFL	111
Rat_P53            E---GPEEALQVS-APAAQEPGTEAPAPVAPASATPWPLSSSVPSQKTYQGNYGFHLGFL	112
                          :                               :. **. . * *.:*::* * 

Zebrafish_P53      QSGTAKSVTCTYSPDLNKLFCQLAKTCPVQMVVDVAPPQGSVVRATAIYKKSEHVAEVVR	143
Frog_P53           QNGTAKSVTCTYSPELNKLFCQLAKTCPLLVRVESPPPRGSILRATAVYKKSEHVAEVVK	148
Human_P53          HSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVR	174
Mouse_P53          QSGTAKSVMCTYSPPLNKLFCQLVKTCPVQLWVSATPPAGSRVRAMAIYKKSQHMTEVVR	171
Rat_P53            QSGTAKSVMCTYSISLNKLFCQLAKTCPVQLWVTSTPPPGTRVRAMAIYKKSQHMTEVVR	172
                   :.****** ****  ***:****.****: : *   ** *: :** *:**:*:*::***:

Zebrafish_P53      RCPHHERTP-DGDNLAPAGHLIRVEGNQRANYREDNITLRHSVFVPYEAPQLGAEWTTVL	202
Frog_P53           RCPHHERSVEPGEDAAPPSHLMRVEGNLQAYYMEDVNSGRHSVCVPYEGPQVGTECTTVL	208
Human_P53          RCPHHERCS-DSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIH	233
Mouse_P53          RCPHHERCS-DGDGLAPPQHLIRVEGNLYPEYLEDRQTFRHSVVVPYEPPEAGSEYTTIH	230
Rat_P53            RCPHHERCS-DGDGLAPPQHLIRVEGNPYAEYLDDKQTFRHSVVVPYEPPEVGSDYTTIH	231
                   *******    .:. **  **:*****    * :*  : **** **** *: *:: **: 

Zebrafish_P53      LNYMCNSSCMGGMNRRPILTIITLETQEGQLLGRRSFEVRVCACPGRDRKTEESNFKKDQ	262
Frog_P53           YNYMCNSSCMGGMNRRPILTIITLETPQGLLLGRRCFEVRVCACPGRDRRTEEDNYTKKR	268
Human_P53          YNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG	293
Mouse_P53          YKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKE	290
Rat_P53            YKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKE	291
                    :***********************  .* **** .*************:***.*  *. 

Zebrafish_P53      ETKTMAKTTTGTKRSLVKESSSATLRPEGSKKAKGSSSDEEIFTLQVRGRERYEILKKLN	322
Frog_P53           GLKPS------GKRELAHPPS---SEPPLPKKRLVVDDDEEIFTLRIKGRSRYEMIKKLN	319
Human_P53          EPHHELPPGS-TKRALPNNTS---SSPQPKKK----PLDGEYFTLQIRGRERFEMFRELN	345
Mouse_P53          VLCPELPPGS-AKRALPTCTS---ASPPQKKK----PLDGEYFTLKIRGRKRFEMFRELN	342
Rat_P53            EHCPELPPGS-AKRALPTSTS---SSPQQKKK----PLDGEYFTLKIRGRERFEMFRELN	343
                               ** *    *     *   **      * * ***:::**.*:*::::**

Zebrafish_P53      DSLELSDVVPASDAEKYRQKFMTKNKKENRGSSEPKQGKKLMVKDEGRSDSD	374
Frog_P53           DALELQESLDQQKVTI--------KCRKCRDEIKPKKGKKLLVKDEQPDSE-	362
Human_P53          EALELKDAQAGKEPGGSRAHS---SHLKSKKGQSTSRHKKLMFKTEGPDSD-	393
Mouse_P53          EALELKDAHATEESGDSRAHS---SYLKTKKGQSTSRHKKTMVKKVGPDSD-	390
Rat_P53            EALELKDAHAAEESGDSRAHS---SYPKTKKGQSTSRHK-PMIKKVGPDSD-	390
                   ::***.:    ..           .  : :   . .: *  :.*    ... 

Does your protein belong to any protein family?

The protein belongs to the p53 family.

Identify the structure page of your protein in RCSB When was the structure solved? Is it a good quality structure? Good quality structure is the one with good resolution. Smaller the better (Resolution: 2.70 Å)

The structure for 1TUP (https://www.rcsb.org/structure/1TUP) was solved and made public in 1995. It is 2.20 Å, higher resolution, so it is a high quality structure.

Are there any other molecules in the solved structure apart from protein?

DNA which is bound and complexed with p53.

Does your protein belong to any structure classification family?**

Open the structure of your protein in any 3D molecule visualization software: PyMol Tutorial Here (hint: ChatGPT is good at PyMol commands) Visualize the protein as “cartoon”, “ribbon” and “ball and stick”.

Color the protein by secondary structure. Does it have more helices or sheets?

Color the protein by residue type. What can you tell about the distribution of hydrophobic vs hydrophilic residues?

Visualize the surface of the protein. Does it have any “holes” (aka binding pockets)?

References

Fogg, V. C., Lanning, N. J., & MacKeigan, J. P. (2011). Mitochondria in cancer: At the crossroads of life and death. Chinese Journal of Cancer, 30(8), 526–539. https://doi.org/10.5732/cjc.011.10018