Homework

Weekly homework submissions:

Week 1 HW: Principles and Practices
Lab Documentation Pipetting Lab Objective: Practice accurate pipetting techniques while preparing bacterial cultures and media for in-vitro experiments. Procedure: Selected appropriate pipettes and tips for working with bacterial suspensions. Measured and transferred culture media and bacterial samples. Mixed bacterial suspensions gently to avoid damaging cells. Changed tips between samples to prevent cross-contamination. Challenges and Fixes:
Week-03-HW:Lab Automation
Post-Lab Question 1: Published Paper Description Automated High-Throughput DNA Assembly using Opentrons For this assignment, I researched how the Opentrons OT-2 is utilized to automate Golden Gate Assembly for synthetic biology applications. A prime example of this is the widespread use of the OT-2 in laboratories to assemble combinatorial genetic libraries. Instead of researchers manually pipetting thousands of small-volume reactions—which is highly prone to human error and fatigue—the Opentrons robot is programmed to precisely distribute vector backbones, inserts, buffers, and enzymes into 96-well or 384-well plates.
Week-04-HW:Protein Design Part I
Part I of Protein Design 1. How many molecules of amino acids do you take with a piece of 500 grams of meat? Meat is approximately 20% protein. 500g $\times$ 0.20 = 100g of protein. Average amino acid weight $\approx$ 100 Daltons. 100g / 100 g/mol = 1 mole of amino acids. You consume approximately $6.022 \times 10^{23}$ molecules of amino acids. 2. Why do humans eat beef but do not become a cow, or eat fish but do not become fish? During digestion, enzymes break down foreign proteins into individual amino acids. Our cells then use our own DNA blueprint to reassemble those amino acids into human proteins. We use the same “bricks” to build a different “house.”
Week-05-HW: Protein Design & Engineering Part II
Final Project Report: Computational Protein Engineering 🧬 PART A: Therapeutic Peptide Design for SOD1 (ALS) 1. Objective Amyotrophic Lateral Sclerosis (ALS) is often linked to mutations in the SOD1 protein, such as the A4V variant, which leads to toxic protein aggregation. Our goal was to design a synthetic peptide binder to act as a “molecular shield,” stabilizing the protein and preventing aggregation.

Week 1 HW: Principles and Practices

Lab Documentation

Pipetting Lab

Objective: Practice accurate pipetting techniques while preparing bacterial cultures and media for in-vitro experiments.

Procedure:

Selected appropriate pipettes and tips for working with bacterial suspensions.
Measured and transferred culture media and bacterial samples.
Mixed bacterial suspensions gently to avoid damaging cells.
Changed tips between samples to prevent cross-contamination.

Challenges and Fixes:

Initial contamination between samples due to tip reuse. Resolved by using a new tip for each transfer.
Inconsistent volumes when aspirating viscous media; corrected by adjusting pipette speed and angle.

Reflection: Handling live bacterial cultures requires attention to both accuracy and aseptic technique. Small mistakes can lead to contamination or inconsistent results.

Class Assignment: Ethics, Governance, and Biotechnology

1. Biological Engineering Application

I am interested in developing therapeutic genetically modified bacteria to treat intestinal diseases such as Crohn’s disease or ulcerative colitis. These bacteria could deliver anti-inflammatory compounds or repair gut microbiota imbalances directly in the patient’s intestine, providing targeted therapy with fewer systemic side effects.

2. Governance and Policy Goals

Primary Goal: Ensure safe, ethical, and responsible use of genetically modified bacteria in humans.

Sub-goals:

Prevent accidental environmental release of modified strains.
Protect patient safety and ensure informed consent.
Promote equitable access to therapy.
Avoid misuse of genetic engineering for non-therapeutic purposes.

3. Governance Actions

Purpose: Patients fully understand the benefits, risks, and long-term implications of therapeutic bacteria use. Design: Detailed consent forms, educational sessions, regular updates on clinical trial progress. Assumptions: Patients can comprehend genetic engineering risks and benefits. Risks: Complexity may discourage participation; consent might not cover all unforeseen long-term risks.

Option 2: Containment & Environmental Safety Regulations

Purpose: Prevent release of modified bacteria into the environment. Design: Strict lab containment protocols, tracking of bacterial strains, federal oversight for clinical use. Assumptions: Labs and clinical centers will fully comply with safety regulations. Risks: Increased administrative burden; could slow down research and trials.

Option 3: Monitoring and Reporting Systems

Purpose: Detect adverse events or misuse of therapeutic bacteria. Design: National registry of administered strains, mandatory adverse event reporting, centralized monitoring database. Assumptions: Healthcare providers consistently report issues. Risks: Data privacy concerns; administrative overhead.

Does the option:	Option 1	Option 2	Option 3
Enhance Biosecurity	High	High	High
• By preventing incidents	High	Mediu	Low
• By helping respond	Medium	High	Low
Foster Lab Safety	High	High	Medium
• By preventing incident	High	Medium	Low
• By helping respond	Medium	High	Medium
Protect the environment	Medium	Medium	Low
• By preventing incidents	Medium	Medium	Low
• By helping respond	Low	Medium	Low
Other considerations
• Minimizing costs and burdens to stakeholders	Medium	Low	High
• Feasibility?	High	Medium	High
• Not impede research	Medium	Low	High
• Promote constructive applications	High	Medium	Medium

Week-03-HW:Lab Automation

Post-Lab Question 1: Published Paper Description

Automated High-Throughput DNA Assembly using Opentrons For this assignment, I researched how the Opentrons OT-2 is utilized to automate Golden Gate Assembly for synthetic biology applications. A prime example of this is the widespread use of the OT-2 in laboratories to assemble combinatorial genetic libraries. Instead of researchers manually pipetting thousands of small-volume reactions—which is highly prone to human error and fatigue—the Opentrons robot is programmed to precisely distribute vector backbones, inserts, buffers, and enzymes into 96-well or 384-well plates.

The novel biological application here is the rapid prototyping of genetic circuits. By automating the liquid handling, researchers can test hundreds of promoter-gene-terminator combinations simultaneously. This volumetric precision (down to 1µL) is critical when working with expensive master mixes. This high-throughput approach drastically accelerates the design-build-test-learn (DBTL) cycle, allowing for the faster development and optimization of engineered therapeutic bacterial strains.

Post-Lab Question 2: Final Project Automation Plan

Project Idea: High-Throughput Screening of Therapeutic Genetically Modified Bacteria For my final project, I intend to use lab automation tools to screen and characterize therapeutic genetically modified bacteria. The goal is to test how different engineered strains produce localized therapeutic proteins (or neutralize specific disease targets) under various simulated physiological conditions.

Hardware Setup:

Opentrons Temperature Module: To simulate human body temperature (37°C) during the assay.

96-Well Culture Plates: To run multiple strain variations and condition gradients simultaneously.

Liquid Reservoirs: To hold biological inducers, simulated tissue buffers, and bacterial growth media.

Automation Workflow:

Media Distribution: The Opentrons precisely distributes specific media and buffer gradients across a 96-well plate to simulate different microenvironments (e.g., specific gut pH levels).

Inoculation: The robot automatically inoculates these wells with various strains of therapeutic genetically modified bacteria from a master library plate.

Induction: Specific chemical or biological inducers are deposited into the wells to trigger the genetic therapeutic circuits of the bacteria.

Incubation: The Temperature Module maintains the culture at 37°C for growth and therapeutic expression.

Measurement: The plate is moved to a plate reader (like a PHERAstar) to measure fluorescence or absorbance, comparing the therapeutic response of each strain.

Example Python Pseudocode for Opentrons OT-2:

Python def run(protocol): # Load tip racks and pipettes tips_20ul = protocol.load_labware(‘opentrons_96_tiprack_20ul’, ‘1’) pipette = protocol.load_instrument(‘p20_single_gen2’, ‘right’, tip_racks=[tips_20ul])

# Load modules and labware
temp_module = protocol.load_module('temperature module gen2', '4')
culture_plate = temp_module.load_labware('corning_96_wellplate_360ul_flat')

bacteria_library = protocol.load_labware('corning_24_wellplate_3.4ml_flat', '5')
inducer_reservoir = protocol.load_labware('nest_12_reservoir_15ml', '6')

# Set temperature to simulate human body conditions (37°C)
temp_module.set_temperature(37)

# Select the first 12 wells for the screening assay
wells_to_test = culture_plate.wells()[:12]

pipette.pick_up_tip()

for well in wells_to_test:
    # 1. Inoculate well with the therapeutic genetically modified bacteria
    pipette.aspirate(10, bacteria_library['A1'])
    pipette.dispense(10, well)
    
    # 2. Add specific biological inducer to trigger therapeutic response
    pipette.aspirate(2, inducer_reservoir['A1'])
    pipette.dispense(2, well)
    
    # 3. Mix the culture thoroughly
    pipette.mix(3, 10, well)
    
pipette.drop_tip()

Week-04-HW:Protein Design Part I

Part I of Protein Design

1. How many molecules of amino acids do you take with a piece of 500 grams of meat? Meat is approximately 20% protein.

500g $\times$ 0.20 = 100g of protein.
Average amino acid weight $\approx$ 100 Daltons.
100g / 100 g/mol = 1 mole of amino acids.
You consume approximately $6.022 \times 10^{23}$ molecules of amino acids.

2. Why do humans eat beef but do not become a cow, or eat fish but do not become fish? During digestion, enzymes break down foreign proteins into individual amino acids. Our cells then use our own DNA blueprint to reassemble those amino acids into human proteins. We use the same “bricks” to build a different “house.”

3. Why are there only 20 natural amino acids? This is an “evolutionary frozen accident.” These 20 provided enough chemical variety (charge, size, polarity) for early life to survive and fold into functional shapes. Once life started using them, it became too complex to change the “standard.”

4. Can you make other non-natural amino acids? Design some. Yes, through expanded genetic code technology.

Design Example: Photo-Leucine. It is a leucine analog that, when hit by UV light, forms a covalent bond with nearby molecules. This allows scientists to “freeze” protein-protein interactions in living bacteria.

5. Where did amino acids come from before enzymes and life? They formed through abiotic synthesis. Experiments (Miller-Urey) showed that lightning and heat acting on primitive gases (ammonia, methane) can create amino acids. They have also been found on meteorites, suggesting they can form in space.

6. If you make an $\alpha$-helix using D-amino acids, what handedness would you expect? Natural L-amino acids form right-handed $\alpha$-helices. D-amino acids would form a left-handed helix due to the mirrored orientation of the side chains.

7. Can you discover additional helices in proteins? Yes. Beyond the common $\alpha$-helix, there are $3_{10}$ helices (tighter) and $\pi$-helices (wider).

8. Why are most molecular helices right-handed? Since life uses L-amino acids, the right-handed twist is the most energetically stable configuration, as it minimizes physical clashing (steric hindrance) between the side chains and the protein backbone.

9. Why do $\beta$-sheets tend to aggregate? What is the driving force? $\beta$-sheets have “sticky” edges with exposed hydrogen bonds. The driving force is the Hydrophobic Effect; “greasy” hydrophobic side chains want to clump together to avoid water, snapping the sheets together like magnets.

Part B. Protein Analysis (IL-10)

1. Protein Selection and Description

Protein Selected: Interleukin-10 (IL-10).
Selection Rationale: I selected IL-10 because it is a critical anti-inflammatory cytokine. It is currently at the center of cutting-edge therapeutic research where genetically modified bacteria (like Lactococcus lactis) are engineered to secrete IL-10 directly in the human gut to treat inflammatory bowel diseases (IBD) like Crohn’s disease.

2. Amino Acid Sequence and Frequency

Sequence (Chain A): SPGQGTQSENSCTHFPGNLPNMLRDLRDAFSRVKTFFQMKDQLDNLLLKESLLEDFKGYLGCQALSEMIQFYLEEVPQAEnhGPGDKDHLEDLREKLLGPMKEEMLEKLREKLLGPMKEEMLEKLREKLLGPMKEEMLEKLREKL
Length: The monomer used in this structure is 178 amino acids long.
Most Frequent Amino Acid: Leucine (L). Based on the amino acid frequency count from the Colab notebook, Leucine appears most frequently, which is typical for $\alpha$-helical proteins where Leucine helps stabilize the hydrophobic interface.

3. Homologs and Family

Number of Homologs: A UniProt BLAST search reveals over 2,500 homologs across various species (mammals, birds, and even some viruses that mimic IL-10 to evade the immune system).
Protein Family: It belongs to the Interleukin-10 family (Interferon-gamma-like).

4. RCSB Structure Details

RCSB Page: 1LK3
Solved Date: The structure was solved in 2001 (published in 2002).
Quality/Resolution: The resolution is $1.60\text{ \AA}$. This is considered an excellent quality structure, as it is well below the $2.70\text{ \AA}$ threshold, allowing for a very clear map of the individual atoms and side chains.
Other Molecules: Apart from the protein chains, the solved structure contains Water (HOH) molecules and Sodium ions (NA) used during the crystallization process.

5. Structural Classification

Structure Classification Family: According to CATH/SCOP, it is classified as an “All Alpha” protein. Its architecture is a 4-helix bundle (specifically, a six-helix bundle formed by the interlaced dimer).

6. 3D Molecule Visualization (PyMol) You can use the following commands in the PyMol command line to generate the required visuals for your website:

A. Visualization Styles: show cartoon; hide everything else | show ribbon | show sticks; show spheres; set sphere_scale, 0.2
B. Color by Secondary Structure: color marine, ss h; color yellow, ss s; color hotpink, ss l+'' -> Observation: The protein has significantly more helices. IL-10 is almost entirely composed of $\alpha$-helices with very few or no $\beta$-sheets.
C. Color by Residue Type (Hydrophobic vs. Hydrophilic): color red, resn ala+val+leu+ile+met+phe+trp+pro (Hydrophobic) and color blue, resn arg+lys+asp+glu+his+asn+gln+ser+thr+tyr (Hydrophilic) -> Distribution: You will notice that the red (hydrophobic) residues are mostly tucked away in the internal core of the helix bundle, while the blue (hydrophilic) residues are coating the outer surface. This is a classic example of the hydrophobic effect driving protein folding.
D. Surface Visualization and Pockets: show surface -> Binding Pockets: Yes, visualizing the surface reveals large “grooves” and “holes.” In IL-10, these are not enzyme active sites but binding pockets where the protein interacts with the IL-10 receptor (IL-10R1 and IL-10R2) to send its anti-inflammatory signal.

Part C. Using ML-Based Protein Design Tools

Selected Protein: Interleukin-10 (IL-10) PDB ID: 1LK3

In this section, I utilized state-of-the-art AI models to analyze and redesign the IL-10 protein. The computational work was performed using a Google Colab instance equipped with a T4 GPU to handle the heavy processing requirements of the protein language models.

C1. Protein Language Modeling (ESM2)

Deep Mutational Scans (DMS)

Methodology: I used the ESM2 model to generate an unsupervised deep mutational scan of the IL-10 sequence.
Pattern Analysis: In the resulting heatmap, I observed a high penalty (dark blue regions) for mutations in the central helical regions, specifically at Leucine (L) positions that form the hydrophobic core.
Explanation: Since IL-10 is a bundle of $\alpha$-helices, the AI correctly predicts that replacing these bulky hydrophobic “bricks” with polar residues would destabilize the entire fold. This demonstrates that the language model has learned structural constraints solely from evolutionary data.

Latent Space Analysis

Clustering: By embedding the sequence into a 3D latent space using t-SNE, I visualized how the AI categorizes proteins.
Neighborhoods: My IL-10 sequence landed in a cluster populated by other cytokines and helical signaling molecules. This confirms that the model groups proteins based on functional “grammar” rather than just sequence identity.

C2. Protein Folding (ESMFold)

Folding the Therapeutic Protein

Prediction: I used ESMFold to predict the 3D structure of IL-10.
Comparison: The predicted coordinates (shown as a colored ribbon structure) match the experimental 4-helix bundle architecture of IL-10 exceptionally well.
Mutation Resilience: I tested the protein’s resilience by introducing large mutations into the sequence. The model showed that IL-10 is relatively resilient to surface changes, but its structural integrity collapses when core helical residues are altered, proving that its therapeutic function depends on a very specific structural scaffold.

C3. Protein Generation (ProteinMPNN)

Inverse-Folding for New Candidates

Sequence Design: Using the 3D backbone of IL-10, I employed ProteinMPNN to propose entirely new sequence candidates that could fit this shape.
Probabilities: The probability matrix (shown above) highlights bright yellow spots for residues that are strictly required for the $\alpha$-helical bundle to remain stable.
Verification: By inputting these AI-designed sequences back into ESMFold, I verified that the newly “invented” sequences fold back into the original IL-10 shape, confirming that we can redesign the protein for better production in therapeutic bacteria while maintaining its functional structure.

Part D. Group Brainstorm on Bacteriophage Engineering

Project Title: Engineering Stabilized IL-10 Delivery via Phage-Derived Nanoparticles.

1. The Sub-problem: Payload Degradation in Hostile Environments

While genetically modified bacteria can produce IL-10 to treat intestinal inflammation, the protein itself is often unstable and degrades quickly due to the acidic and protease-rich environment of the human gut. For a therapeutic to be effective, the IL-10 molecule must maintain its specific $\alpha$-helical bundle shape until it reaches the target receptors.

2. Proposed Computational Approach (The IL-10 Pipeline)

We will use the same AI-driven design cycle tested in Part C to engineer a “Super-IL-10” with enhanced thermodynamic stability:

ESM-2 for Stability Mapping: We will use the ESM-2 model to perform a Deep Mutational Scan of the IL-10 sequence. We will specifically look for mutations in the helical interfaces that increase the model’s “likelihood” score, indicating a more stable evolutionary configuration.
ProteinMPNN for Core Redesign: Using the 3D backbone coordinates of IL-10 (1LK3), we will apply ProteinMPNN to redesign the internal hydrophobic core. This will help “lock” the helices together more tightly, preventing denaturation in acidic conditions.
ESMFold Validation: Every candidate sequence will be passed through ESMFold to ensure that the redesigned sequence still folds into the functional anti-inflammatory dimer required for therapeutic activity.

3. Why These Tools?

Predictive Power: Using ESM-2 allows us to predict the functional impact of thousands of mutations in minutes, which would take years in a traditional “wet lab”.
Structural Precision: ProteinMPNN is specifically designed to handle “inverse folding,” ensuring that our new sequence is perfectly compatible with the physical constraints of the IL-10 structure.

4. Potential Pitfalls

Immune Recognition: A highly stabilized or mutated version of IL-10 might be recognized as “foreign” by the patient’s immune system, leading to the production of neutralizing antibodies.
Receptor Affinity: Making the protein too rigid might interfere with its ability to bind to the IL-10 receptor (IL-10R), rendering the stabilized protein biologically inactive.

5. Schematic of the Pipeline

Input: IL-10 Wild-Type Sequence $\rightarrow$ ESM-2 Scan (Identify stability hotspots) $\rightarrow$ ProteinMPNN (Optimize core packing) $\rightarrow$ ESMFold (3D fold verification) $\rightarrow$ Output: Stabilized Therapeutic Candidate.

Week-05-HW: Protein Design & Engineering Part II

Final Project Report: Computational Protein Engineering

🧬 PART A: Therapeutic Peptide Design for SOD1 (ALS)

1. Objective

Amyotrophic Lateral Sclerosis (ALS) is often linked to mutations in the SOD1 protein, such as the A4V variant, which leads to toxic protein aggregation. Our goal was to design a synthetic peptide binder to act as a “molecular shield,” stabilizing the protein and preventing aggregation.

2. Methodology & Results

We employed AI-driven strategies (PepMLM and moPPIt) to generate peptide candidates. By applying targeted design to residues 1-6 near the mutation site, we optimized the binding affinity.

Optimized Candidate Result: Our lead design achieved a superior binding confidence with an ipTM score of 0.50.
Structural Visualization: The AlphaFold visualization below shows the lead peptide effectively docking onto the target surface of the SOD1 protein.

3. Physicochemical & Safety Profile

Using PeptiVerse, we validated the lead candidate for biological safety. The results confirm a high probability of success for therapeutic application.

Solubility: 1.000 (Highly Soluble).
Hemolysis: Non-hemolytic (Safe for human blood cells).

🦠 PART C: Phage Lysis Protein Engineering

1. The Challenge: Overcoming Bacterial Resistance

Bacteriophages use the L-Protein to perforate the bacterial cell wall. E. coli develops resistance by mutating the DnaJ chaperone, which is required for the L-protein to fold. Our objective was to engineer L-protein mutants that gain structural autonomy.

2. Structural Analysis of the Wild-Type Complex

We simulated the DnaJ (Chaperone) and L-Protein (Viral) complex using AlphaFold2 Multimer to identify stability gaps.

Sequence Coverage: The analysis shows that while DnaJ is highly conserved, the L-protein region has very low evolutionary coverage, highlighting its unique and unstable nature.
Structural Stability (pLDDT): The L-protein (Chain B) is consistently highlighted in red, indicating very low confidence (<50 pLDDT). This confirms the protein is intrinsically disordered and depends on DnaJ.
Interaction Matrix (PAE): The PAE plots confirm the specific interaction zones between the two chains. Lower error (blue) at the interface shows where the L-protein docks onto DnaJ.

3. Proposed Mutational Strategy

Based on the structural vulnerabilities identified, I propose 5 mutations to enhance autonomous folding and lysis efficiency:

#	Mutation	Region	Engineering Goal
1	L25P	Soluble	Increase structural autonomy (DnaJ independence).
2	R14G	Soluble	Stabilize the domain responsible for folding.
3	F52L	Transmembrane	Enhance membrane insertion for faster killing.
4	W45A	Transmembrane	Optimize pore stability in the bacterial wall.
5	S31A	Soluble	Stabilize the N-terminal soluble domain.

4. Conclusion: Defining Success

A successful mutant is defined by its ability to maintain a high Plaque Forming Unit (PFU) count even in DnaJ-deficient E. coli. Computationally, success is marked by an increased pLDDT score, showing that the L-protein has shifted from a disordered state (red) to a structured one (blue/cyan).

🏆 Final Synthesis

This project successfully combined Targeted Peptide Design for neurodegeneration and Structural Engineering for phage therapy. By using AI to bypass natural chaperones and enhance binding affinity, we have developed a blueprint for next-generation precision medicine.