Hi, I’m Eleonora, a junior in Bio‑Convergence at Yonsei University. 🧪 I’m especially interested in immunoengineering and biofabrication fields, and I’ve recently become curious about how design can shape the way we understand and communicate biology. 🎨 This is my first time diving deeply into synthetic biology and testing my skills, so through this course, I hope to learn new tools, see different applications of bioengineering, build trustful community, explore how creative I can be in this field, and how design and biology can work together. 🌱
Part 1 – Benchling & In-silico Gel Art I used Benchling to design an in‑silico restriction digest of Lambda DNA. In Benchling, I created a customized restriction enzyme list for smoother later operations that included all the enzymes provided in the Week 2 HTGAA homework
Assignment 1: Python Script for Opentrons Artwork This week we are creating a Python file to run on an Opentrons OT-2 liquid handling robot to create flourescent designs. Using provided website I created a small “Cherry” pattern. I have little experience in coding on such platofrms, so Google Gemini was a big help to assist while writing a code: https://colab.research.google.com/drive/1kZZStiHlPdG17vqHZPM2IhAQ3vTWkMRb#scrollTo=pczDLwsq64mk&line=76&uniqifier=1
Part A Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)
How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) Since meat is not entirely made of proteins, lets assume 20% of the whole meat mass = around 100 g. An amino acid is ~100 Da (=~100g/mol). 100 g/ (100 g/mol) = 1 mol = 6.022* 10^23 AA.
Part A Some background:
Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS) ALS is a heterogeneous, severe neurodegenerative disorder, the hallmark of which is an adult-onset loss of upper and lower motor neurons. It leads to a progressive paresis and atrophy of skeletal muscles, resulting in quadriplegia and fatal respiratory failure. The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation. Task: Design short peptides that bind mutant SOD1 & then decide which ones are worth advancing toward therapy.
DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? Phusion DNA Polymerase*
Chimeric enzyme that catalyzes the synthesis of ew DNA strand in the 5 -> 3 direction with high-fidelity dNTPs
four chemical building blocks ($dATP, dTTP, dCTP, dGTP$) used to construct the DNA. They provide both the physical material and the energy required for the polymerase to grow the new strand Reaction Buffer
Subsections of Homework
Week 1 HW: Principles and Practices
Question 1 – Application & why
First, describe a biological engineering application or tool you want to develop and why.
Introduction
My proposition for a biological engineering application is a synthetic cell circuit for neuroprotection in neurodegenerative diseases that is non-invasively controlled by a physical sound/ultrasound signal to help modulate inflammation and support brain health.
Motivation
During my junior year, I started learning about neurodegenerative diseases and current therapies. I came across lots of reading explaining non-pharmacological tools, such as music therapy, that are used as a complementary support rather than precise, controlled interventions. My interets was going beyond background music therapy and instead treating acoustic stimulation to its full potential as one possible non-invasive control channel for an engineered neuro-immune circuit.
Synthetic biology has already shown that mammalian cells can be engineered with mechanogenetic and sonogenetic switches to trigger therapeutic gene expression via receptor or responsive promoters. Music and music-like acoustical interventions could be engineered to play the role of an external controller that does not require being injected or physically contact witha patient
Design
A simple example would be an acoustic‑controlled promoter driving anti‑inflammatory cytokines such as IL‑10 or TGF‑β, neurotrophic factors like BDNF or GDNF, or enzymes that enhance clearance of toxic proteins such as Aβ.
The core logic gate would be an AND gate that requires both an acoustic input and a local inflammatory signal (for example, NF‑κB activation) before turning on the therapeutic gene, so that the circuit activates only when the brain is inflamed and the specific sound signal is applied.
Question 2 – Governance goals
Next, describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.
Goal 1: Long-term biological safety of use
Ensure that sound-controllable synthetic immune circuits are designed and used in a way that is biologically safe and technically trustworthy.
Sub goal 1.1. Manage biological and technical risks
Identification and termination of key risks. Targeted circuit development design.
Sub goal 1.2. Robust testing and monitoring
Ensure there is detailed preclinical testing and long-term clinical monitoring before device deployment
Goal 2: Protection and respectful use in memory-impaired patients
Protect the rights and autonomy of neurodegenerative patients who receive this treatment and avoid health inequalities
Sub goal 2.1. Control and consent
Develop a consent and specialised process that would not violate rights of memory-impaired individuals patients
Sub goal 2.2. Ability to withdraw
Ensure patients can decline the intervention or request deactivation/removal of the circuit
Sub goal 2.2. Promote equity in access
Allow public health systems and diverse patient groups to benefit from this technology
Question 3 – Governance actions
Next, describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions …
Option 1: Establishing Regulation Rules and Technical Standards
Purpose: Outline clear guidelines for such circuits to create standardized safety requirements before any medical implementation and fabrication.
Design: The regulators for such action would include national FDA-like agencies, neurology societies, and expert committees. A specific category and preclinical studies would be defined to mitigate potential risks of off-target activation, long-term expression, response to repeated acoustic exposure, and biological safety.
The “safety checklist” could be developed for synthetic switches and minimum acoustic parameter requirements.
Assumptions: This assumes developers would agree to additional testing and expert review for approval.
Risks: In case of standards being considered too weak for fabrication without consideration of unknown long-term risks. On the contrary, overly complicated standards might make the whole project too expensive and unachievable.
Option 2: Setting Advance Directives
Purpose: Build a system that lets patients with neurodegenerative disease state their wishes in advance and appoint a trusted person to help control when and how the acoustic stimulation is used if their memory or decision‑making declines.
Design: Use advance directive forms specific to this intervention, completed while the patient still has capacity, where they can (a) record preferences about starting, pausing, or stopping stimulation, and (b) designate a person/guardian who is allowed to initiate, schedule, or terminate acoustic stimulation.
Assumptions: Assumes patients receive a diagnosis early enough, and with enough support, to complete advance directives; that legal systems recognize such documents and surrogate decision‑makers for neuromodulation or implantable synbio interventions; and that clinicians have time and training to revisit consent and preferences over time.
Risks: Some patients may never complete directives, leaving families and clinicians uncertain; designated guardians might have conflicts of interest or interpret wishes differently from what the patient would want. Strict reliance on old directives could also override a patient’s current expressions if they still have partial capacity or have changed their mind, which could undermine respect for present‑time autonomy.
Option 3: Set a transparency and public access
Purpose: Ensure the proven safety and effectiveness to the public with an understanding of all risks, benefits, and intervention procedures.
Design: Build a public interest campaign/communication platform with an explanation of the technology and treatment procedures, including uncertainty and possible side effects. Require recruiting diverse groups in clinical trials. Not limit the research to private research hospitals only.
Assumptions: Health systems are willing to invest in high-quality communication and marketing to reach diverse communities.
Risks: With too succesfull communication campaign, the public may overestimate benefits or underestimate uncertainty and risks. Policies to ensure inclusive trials and access may increase costs and administrative complexity for hospitals.
Question 4 – Scoring the options
Next, score (from 1–3 with 1 as the best, or n/a) each of your governance actions against your rubric of policy goals.
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
1
2
3
• By preventing incidents
1
2
3
• By helping respond
1
2
3
Foster Lab Safety
1
2
3
• By preventing incident
1
1
3
• By helping respond
1
2
3
Protect the environment
n/a
n/a
n/a
• By preventing incidents
n/a
n/a
n/a
• By helping respond
n/a
n/a
n/a
Other considerations
2
2
n/a
• Minimizing costs and burdens to stakeholders
3
2
2
• Feasibility?
2
1
2
• Not impede research
3
1
1
• Promote constructive applications
2
2
2
Question 5 – Recommendation & reflection
Last, drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why …
According to the scoring table, I prioritize both Option 1 and 2, which balances the hospital ethics and regulatory rules approved by national regulatory actors. This combination ensures that the biological tool is governed by both human-centric ethics and rigorous technical safety.
The target for this choice would be the FDA and NIS communities, with international groups working in neurology and the clinical trial approval committee.
Option 2 scores well (1) on feasibility, low costs, and patient autonomy—it uses existing hospital systems for quick consent processes and monitoring. Option 1 scores best (1) on biosecurity and lab safety prevention, adding uniform rules like safety checklists for acoustic frequencies. Together, they cover biological safety (Goal 1), patient rights (Goal 2), and fair access through trials (Goal 2) without major delays to research.
Considered Trade-Offs & Assumptions
This combination may have risks in uneven standards across hospitals, since each hospital may have its own patient consent, as well as higher costs and longer approval times.
Reflecting on what you learned and did in class this week, outline any ethical concerns that arose … then propose any governance actions you think might be appropriate to address those issues.
From the first week’s lesson and recitation, the topic that caught my attention was genetic engineering and pathogen research/studying viruses in bats or building synthetic genetic circuits in these organisms. Even simple work, such as modulating pathogens or implementing circuits in cells, carries big biosecurity risks. If not handled carefully, a dangerous pathogen could escape the lab, spread to people, or be misused. This led to long thought for me on how this issue is being regulated now and how these experiments are conducted safely without stopping important science.
Governance solutions
Mandatory additional training: Require specialized training for all lab workers on incident reporting, strict entry/exit protocols, and emergency response. This builds skills to prevent accidents, like pathogen leaks during bat virus studies.
Screening panels with oversight: Create independent review panels of scientists and safety experts to screen high-risk experiments (e.g., pathogen modulation or synthetic circuits). These panels would approve protocols, monitor ongoing work, and ensure regular audits—similar to dual-use research reviews.
Another frequently mentioned topic from class was “core libraries” in synthetic biology. Biobanks, genetic databases, and DNA sequence archives are presented like reusable IP blocks. In many cases, patient data or cells are taken without permission and used for science or profit.
Governance solutions
Broader consent involvement with time-limited withdrawal rights. When patients enter treatment, get broad consent for future unknown uses. Allow donors or families to withdraw from data access within a clear time period (e.g., 6-12 months). This protects privacy early on while preventing disruptions after data is already shared and in open research use.
Rules for sharing and minor benefits to track the contribution by group.
Pre-lecture Questions
Homework Questions from Professor Jacobson:
Details
Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?
Answer
DNA polymerases have an error rate of about 10*-2 errors per base.
The human genome is ~3.2 × 10*9 bp in lenght, so this creates a significant disperancy which results in thousands of errors percopy.​
Biology fixes this with proofreading by polymerase and post‑replication mismatch repair (MutS/MutL/MutH etc.), which together reduce the error rate.
Details
How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?
Answer
An average human protein is ~330–350 amino acids, giving the possibility of a massive number of DNA sequences (around 10*150), because of the portein redundancy of the genetic code.
Many possible codes “don’t work” because sseries of resons: secondary structure of mRNA; poor codon usage/tRNA availability; splicing or binding sites.
Homework Questions from Dr. LeProust:
Details
What’s the most commonly used method for oligo synthesis currently?
Answer
The standard, most widely used method is solid‑phase phosphoramidite chemistry.
Details
Why is it difficult to make oligos longer than 200nt via direct synthesis?
Answer
It is difficult to make long oligos via direct synthesis due to comulative yiel loss. By ~200 bases there are many truncated and error‑containing products and it is hard to purify the correct full‑length oligo.
Details
Why can’t you make a 2000bp gene via direct oligo synthesis?
Answer
A 2 000‑step phosphoramidite synthesis would give zero yield.
Instead, synthesizing many shorter oligos, then assembling them enzymatically (PCR assembly, Gibson, etc.) into longer gene fragments is used.
Essential for humans/animals: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, and arginine.​
Animals already depend on the diet for multiple essential amino acids, including lysine, so making organisms “lysine‑dependent” is not a safe way to contain a synthetic organism. Though for movie purposes it is a fun scientific explanation.
Week 2 HW: DNA Read, Write, & Edit
Part 1 – Benchling & In-silico Gel Art
I used Benchling to design an in‑silico restriction digest of Lambda DNA. In Benchling, I created a customized restriction enzyme list for smoother later operations that included all the enzymes provided in the Week 2 HTGAA homework
Using Ronan’s website, I tried to create a “Bat signal” 🦇 pattern on the gel (hopefully you can see my vision too!)
This was my first attempt, where the lanes did not appear in the order I expected, so the pattern looked wrong…
To fix this, I renamed each “Digest” tab with numbers, because every new digest was appearing in a random order.
After running all the digests and then ordering the numbered lanes correctly, I finally obtained my intended DNA gel “Batman” pattern.
Part 3 - DNA Design Challenge
Protein – TRPV1 (heat and “spicy” pain sensation)
cation channel expressed in nociceptive sensory neurons, where it detects noxious heat, low pH, and capsaicin (main compound in chili peppers) 🌶️. I chose TRPV1 because it directly links physical stimuli at the skin (heat or spicy chemicals) to electrical activity in pain pathways, making it a clear molecular mediator of sensory perception. Engineering the DNA sequence that encodes TRPV1 could tune its expression or gating properties, which is relevant for altering thermal pain sensitivity or designing cells that report damaging levels of heat.
Codon Optimization
For codon optimization, I planned to take my reverse‑translated TRPV1 coding sequence and run it through an online codon optimization tool to adapt codon usage to E. coli, replacing rare codons, adjusting GC content, and removing unwanted motifs while keeping the amino‑acid sequence unchanged. However, the TwistBioscience optimization tool was unavailable and other available web tools repeatedly failed on my long TRPV1 sequence, so for this homework I kept the reverse‑translated sequence from Part 3.2 as my working TRPV1 coding sequence and discussed codon optimization conceptually instead of providing a fully optimized sequence.
3.4: What technologies could be used to produce this protein from your DNA? Describe in your words the DNA sequence can be transcribed and translated into a protein. You may describe either cell-dependent or cell-free methods, or both.
Once I have a coding DNA sequence for TRPV1, I can synthesize it and clone it into an expression plasmid with a suitable promoter, ribosome binding site, and terminator. After transforming this plasmid into host cells such as E. coli or mammalian cells, RNA polymerase transcribes the TRPV1 gene into mRNA, and ribosomes translate the mRNA into the TRPV1 channel, which is inserted into the plasma membrane and opens in response to heat or capsaicin to generate pain signals. The same DNA sequence could also be used in a cell‑free transcription–translation mix to produce TRPV1 in vitro, still following the central dogma from DNA to RNA to protein
Part 4
I created a new linear DNA sequence in Benchling named sfGFP, set the nucleotide type to DNA, and topology to Linear. In the sequence editor I pasted, in order, the example promoter BBa_J23106, RBS BBa_B0034 with spacer, start codon (ATG), the provided codon‑optimized sfGFP coding sequence, a 7×His tag at the C‑terminus, a stop codon (TAA), and the BBa_B0015 terminator, and added annotations for each feature (Promoter, RBS, sfGFP CDS, 7×His tag, Stop, Terminator).
Here you can see the screenshot from Benchling showing the sequence map: (https://benchling.com/s/seq-KNkSG9FjYrEgCrgZE0Id?m=slm-aiflv0AFXb7Fro539sLk)
On the Twist portal I selected the “Genes” product and chose the “Clonal Genes” option, since this provides my insert in a circular plasmid that can be transformed directly into E. coli. I imported the FASTA file of my sfGFP expression cassette as a nucleotide sequence, then chose a Twist cloning vector (pTwist Amp High Copy) as the backbone so that the final construct includes an origin of replication and ampicillin resistance. After Twist generated the plasmid design, I downloaded the GenBank file and re‑imported it into Benchling to view the full plasmid map with my annotated sfGFP expression cassette inserted:
Part 5
DNA Read đź“–
What DNA would you want to sequence and why?
I would like to sequence DNA from banana (Musa species) to explore how similar or different it is from the human genome, especially because of the known fun fact stating that humans “share around half their genes” with banana.
By sequencing banana DNA, I would wanna compare it to human gene sets and get the idea where these similarities come from and what they lead to. 🍌
What technology would you use and why?
I would use Illumina sequencing‑by‑synthesis (second‑generation NGS), possibly complemented by nanopore (third‑generation) for long reads.
Input and prep: extract banana genomic DNA, fragment it, repair ends, ligate Illumina adapters, PCR‑amplify, then load on a flow cell
How it reads bases: clusters are formed on the flow cell. In each cycle, fluorescently labeled nucleotides are added, one base at a time, and the machine takes a picture. The color of each spot in each cycle tells you which base (A, T, C, or G) was added there.
Output: millions of short reads in FASTQ format, which can be assembled and compared to human genes
DNA Write ✍🏽
What DNA would you want to synthesize (e.g., write) and why?
I would like to synthesize a genetic circuit for a “self‑adjusting” biomaterial, where cells inside a hydrogel can sense mechanical stress and then change the stiffness of the material. The idea is to have a material that becomes stiffer when it needs more support and softer when stress is too high, using gene expression instead of external tools. This could be useful for tissue engineering and mechanobiology, because many studies show that cell fate and behavior depend not only on stiffness, but also on how stiffness changes over time
What technology would you use to perform this DNA synthesis and why?
To build this circuit, I would use chip‑based DNA oligo synthesis plus clonal gene synthesis, and then assemble the parts into an expression cassette. Chip‑based synthesis is good for designing and producing many regulatory variants (different mechanosensitive promoters, crosslinker genes, degradation domains) in parallel, which is important when tuning a dynamic material
Essential steps
Design the circuit in silico: pick mechanosensitive promoter elements, choose coding sequences for matrix‑building proteins and matrix‑remodeling enzymes, then add RBSs and terminators
Order synthetic DNA fragments or full clonal genes from a synthesis provider, using chip‑based oligo synthesis to keep costs down for complex designs.
Assemble the fragments into plasmids, transform them into the chosen cell chassis, and verify by sequencing
Limitations
Complex construction can have a high error rate
Synthesis and clonign might take several days to weeks
Mechanosensitive elements characterized in 2D cultures may behave differently in 3D hydrogels
DNA Edit đź–†
What DNA would you want to edit and why?
I would like to edit DNA in cartilage‑related cells for athletes. The example would be figure skaters who often perform repeated high jumps and landings that produce a very high impact on the knee and ankle. Most figure skaters frequently develop overuse injuries and early degenerative changes in the ankle/knee joints. This leads to the early retirement of athletes in their early teens and extensive health problems. Editing joint cartilage cells to be more regenerative, so that damaged cartilage can be repaired more effectively over time.
The target gene would be SOX9 and TGF-Beta pathway genes, since they are known to be the main pro-generative genes in cartilage.
The reason why I wouldn’t want to explicitly target genes related to the defensive functions of cartilage to prevent injuries is that it would raise some ethical concerns.
What technology or technologies would you use to perform these DNA edits and why?
I would use CRISPR-based gene activation in joint-derived stem cells to upregulate SOX9 and TGF-Beta pathways genes. This technology would guide RNAs targeting promoters to boost cells’ own existing genes without cutting DNA. This would explicitly focus on existing injuries.
Essential steps
Confirm that SOX9 and key TGF genes are pro-generative in articular cartilage and design guide RNAs that bind promoter regions of SOX9 adn TGFB-pathways genes in human joint cells
Build dCas9-activator plasmids for designed gRNAs
Deliver dCas9-activator and gRNA to the cell
Culture and differentiate edited cells towards cartilage
Preparation and inputs
Extensive research and selection of targeted genes and regulatory regions in human joint cartilage
design of guide RNA
selection of dCas9-activator
Inputs: DNA templates, plasmids, viral vectors encoding dCas9-activator, plasmids for gRNAs, patient derived MSCs cells
Limitations
Since dCas9 does not cut DNA, there is a possibility of upregulation of unintended genes, because of the off-target binding
There should be controlled upregulation, since over-activation of these genes can lead to fibrosis or abnormal tissue growth
Published Paper: Fabrication of cell culture hydrogels by robotic liquid handling automation for high-throughput drug testing (Torchia et al., 2024).
Description This paper addresses the difficulty of manual hydrogel fabrication, which is often prone to human error and low reproducibility due to the viscosity of the materials. The authors utilized an Opentrons OT-2 to automate the mixing and deposition of various hydrogel precursors (including methacrylated gelatin and others) into 96-well plates.
Relevance
The Opentrons OT-2 will be essential for the chemical formulation of the Bio-Blocks. Because the effectiveness of dissolution depends on the precise concentration of hexametaphosphate and citrate, the robot will be used to: Generate Concentration Gradients of alginate, HMPs and citarte & Ensure Consistency by automating the inoculation of cross linking agents
3D-Printed Holders & Custom Hardware would be developed for molding structural blocks
Creation of bylayer hydorgels can be achieved using robot to deposit a “structral layer” wiht high cross-linking density
Week 4 HW: Protein Design
Part A
Answer any NINE of the following questions from Shuguang Zhang: (i.e. you can select two to skip)
How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)
Since meat is not entirely made of proteins, lets assume 20% of the whole meat mass = around 100 g. An amino acid is ~100 Da (=~100g/mol).
100 g/ (100 g/mol) = 1 mol = 6.022* 10^23 AA.
Why do humans eat beef but do not become a cow, eat fish but do not become fish? Proteins in processed meat are getting denatured in our stomach by HCl and the enzyme pepsin, cutting long polypeptides. Proteases continue cutting these peptides into smaller peptides and intestinal enzymes complete the digestion into amino acids.
Shortly, our bodies do not absorb animal proteins whole, but use different enzymes to break them down to get basic amino acids
Why are there only 20 natural amino acids?
20 amino acids are representing an ideal balance for biological efficiency and chemical necessity to build all known life on Earth.
Where did amino acids come from before enzymes that make them, and before life started?
Amino acids were synthesized abiotically through high-energy interactions between gases.
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
L-amino acids are all right-handed due to steric hindrance between side chains. Since the D-enantiomer is a mirror image of an L-enantiomer, we would expect left handed helix
Can you discover additional helices in proteins?
The new helices are being discovered every day using tools like Alpha Fold.
Why are most molecular helices right-handed?
Because of the dominance of L-aminoacids in life and their chirality, most of the helices are right-handed to be sterically and energetically favourable.
Why do β-sheets tend to aggregate?
What is the driving force for β-sheet aggregation?
Beta sheets are characterised by their open structure, where the carbonyl and amide groups are exposed at the edges. This exposure promotes hydrogen bonding with neighbouring strands, that is forming a stack of “sheets”.
Why do many amyloid diseases form β-sheets?
Because of the stacking nature of beta-sheets, amyloid diseases occur when proteins misfold into flat, “sticky” layers that act as templates, forcing other healthy proteins to aggregate into insoluble, thread-like fibrils. The chain of reaction recruits new proteins that are resistant to clearing mechanisms
Can you use amyloid β-sheets as materials?
This mechanism, though, can be quite beneficial for the biomaterials. Beta-sheets represent extreme stability and high tensile strength for such biomaterials as vascular grafts in medicine, which need to have resistance function inside the body
Part B
Briefly describe the protein you selected and why you selected it.
For this part, I have selected Clathrin Heavy Chain (CHC).
This protein is widely known in biology as a self-assembly protein consisting of three light chains that join into a triskeleion. Triskelions them assemble inot a geometric closed shape that creates vesicles.
Identify the amino acid sequence of your protein.
How long is it? What is the most frequent amino acid? You can use this Colab notebook to count the frequency of amino acids.
Protein: Clathrin Heavy Chain 2 (Human).
Length: 1,645 amino acids.
Most Frequent Amino Acid: Leucine (L) - appears 196 times.
Blast search revealed homology with other clathrins, confirming its belonging to the Clathrin heavy chain family
Identify the structure page of your protein in RCSB
PDB ID: 1XI4
The resolution is 2.30 Ă…, which is better (smaller) than the 2.70 Ă…
It was published in 2004 and apart from the protein, the structure contains water molecules and glycerol
Classification: It belongs to the 7-bladed beta-propeller family.
I analyzed clathrin D6 coat (PDB 1XI4), focusing on one heavy‑chain leg (chain A).
I viewed it as cartoon/ribbon, which shows a long curved backbone made almost entirely of α‑helices with only short loops.
I also added ball‑and‑stick on top of the ribbon to see individual atoms and side chains.
Coloring by secondary structure (helices red, sheets yellow, loops green) showed that chain A is strongly helix‑rich, with almost no β‑sheets.
Coloring by residue type (hydrophobic yellow; acidic red; basic blue; polar cyan) revealed that hydrophobic residues are mostly buried in the helical core, while charged/polar residues are on the surface.
When I displayed the surface, I saw grooves and shallow cavities between helices rather than one deep pocket, suggesting multiple shallow binding/interaction sites along the leg.
Based on the heatmap I got for my protein, I navigated to the locations where a sharp contrast was noticeable between highly sensitive sites (dark blue) and tolerant mutations (yellow). I identified three random locations (residues) that stood out by being next to dark blue. These yellow spots (see photo below) represent permessive mutations: specific amino acid substitutions that the language model predicted will preserve the protein structural and functional integrity despite being highly conserved regions.
Latent Space Analysis
In my Latent Space Analysis, my protein (Human Tyrosinase) appeared within the class of All-Alpha protein neighborhood, which makes biological sense because both proteins share a conserved di-copper binding fold. This shows that the ESM2 model can accurately group proteins by their 3D shape and evolutionary ’language,’ even if they come from completely different species.
I changed the code a bit so my protein would be visible wihtin thousands of dots.
C2. Protein Folding
I chose a random protein I found on the ESM Metagenomic Atlas fro my Protein folding task. Amino Acid sequence:
Few mutations haven’t changed the 3D structure at all, so I performed a ‘stress test’ on my protein by changing a large sequence of amino acids from position 170 to 200. The 3D model showed minor conformational change:
C3. Protein Generation
After performing Inverse Folding with ProteinPMNN, I’ve received the next sequence:
The resulted sequence appeared to 232 AA long, compared to the original 287 AA.
After inputting this sequence into ESMFold, next 3d structure formed:
By the comparison we can see it differs from the original structure but has some similarities.
Original
New
Part D
Project ProposalChosen Goals
Increase toxicity (lytic efficiency) of the MS2 L protein by tuning its interaction with E. coli DnaJ and its putative target.
Improve thermal and conformational stability of L so that toxic variants remain well folded and functional across experimental conditions.
Computational approach
Protein language models (ESM-2 / ProGen) - to design
Run in silico mutagenesis on the MS2 L sequence to score single and small combinatorial substitutions for evolutionary “fitness” and tolerated diversity.
Use these scores to (i) preserve positions that are highly conserved or known to be essential for lysis and DnaJ dependency, and (ii) explore mutations at more flexible residues that may enhance toxicity or stability.
Structure prediction (AlphaFold-Multimer or AlphaFold3)
Model the complex between full-length MS2 L and E. coli DnaJ, using the experimentally defined minimal lytic domain and the N‑terminal basic regulatory domain as guides.
Map the predicted binding interface around residues implicated in DnaJ dependence and inactivating missense mutations (for example, the conserved Leu48–Ser49 motif and neighboring central-domain residues).
Use these models to prioritize mutations predicted to strengthen productive L–DnaJ contacts or relieve autoinhibition of L while maintaining membrane association.
Sequence redesign for stability (ProteinMPNN, Foldseek/NGL/PyMOL)
For promising L variants from the pLM and AlphaFold stages, use ProteinMPNN on fixed backbones to propose alternative side chains that lower the estimated folding free energy (ΔG) without disrupting the DnaJ-contact surface.
Visualize candidate designs in NGL Viewer or PyMOL to check for clashes, loss of transmembrane character, or obvious disruption of the domain architecture suggested by mutational analysis.
Potential pitfalls and limitations
Mutations that increase toxicity may destabilize the protein or alter its membrane topology, leading to misfolding or loss of function
DnaJ Conformational Flexibility: Chaperones like DnaJ are inherently flexible. A static AlphaFold model might not capture the dynamics for lysis and false positive
Pipeline Schematic
Input: Wild-type MS2 L Protein Sequence.
Step 1 (Optimization): ESM-2 mutation scoring for fitness.
Step 2 (Binding): AlphaFold-Multimer modeling of L-Protein + DnaJ complex.
Step 3 (Refinement): ProteinMPNN sequence redesign for thermal stability.
Output: Top 5 candidate sequences for in vitro synthesis and plaque assay testing.
Week 5 HW: Protein Design. Part 2
Part A
Some background:
Mutations in SOD1 cause familial Amyotrophic Lateral Sclerosis (ALS)
ALS is a heterogeneous, severe neurodegenerative disorder, the hallmark of which is an adult-onset loss of upper and lower motor neurons.
It leads to a progressive paresis and atrophy of skeletal muscles, resulting in quadriplegia and fatal respiratory failure.
The mutation subtly destabilizes the N-terminus, perturbs folding energetics, and promotes toxic aggregation.
Task: Design short peptides that bind mutant SOD1 & then decide which ones are worth advancing toward therapy.
Using the colab book, 4 peptides (12 AA in length) were predicted:
Index
Binder
Perplexity
0
KHYPVAAVELKK
13.056877
1
KLYYPTALEWKK
20.008396
2
WLYPATVLALGK
11.976275
3
WRYGVVVAAHKK
9.597134
control
FLYRWLPSRRGG
Based on the results, the best candidate for the future drug therapy would be: WRYGVVVAAHKK with a score of 8.06, indicating that the model has high confidence in its binding potential relative to the benchmark.
Week 6 HW: Genetic Circuits Part I: Assembly Technologies
DNA Assembly
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
Phusion DNA Polymerase*
Chimeric enzyme that catalyzes the synthesis of ew DNA strand in the 5 -> 3 direction with high-fidelity
dNTPs
four chemical building blocks ($dATP, dTTP, dCTP, dGTP$) used to construct the DNA. They provide both the physical material and the energy required for the polymerase to grow the new strand
Reaction Buffer
Maintains the optimal pH and ionic environment for the reaction. It ensures the enzyme remains stable and functional throughout the high-temperature cycles of PCR.
Magnesium Chloride
Co-factor for the polymerase enzyme. Without magnesium ions, the enzyme cannot catalyze the chemical reaction needed to link the DNA building blocks together
Additives & Stabilizers
Chemicals like glycerol or detergents protect the enzyme from degradation. Their purpose is to keep the master mix stable during storage and prevent the proteins from sticking to the plastic tube walls.
What are some factors that determine primer annealing temperature during PCR?
The primer annealing temperature ensures primers stick specifically to the target DNA. Main factors are:
Primer Length and Composition: ratio of G-C to A-T bases (G-C pairs have three hydrogen bonds)
Primer ConcentraciĂłn: higher concentration can increase binding
Salt concentration: cations like K+ and Mg2+ stabilize the DNA backbone, which increases the melting temperature
Base Mismatches
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
Both PCR and Restriction Enzyme Digests are fundamental techniques for generating DNA fragments; they differ in the process of generating these fragments
PCR
requires a template of DNA
uses heat and a polymerase to synthesize new copies of a specific region
main components are primers, dNTPs, DNA polymerase, and thermal cycler
depend on the designed primer
Use when we have a low DNA sample volume/ we want to create a fragment of a very specific, non-common length
Restriction Enzyme Digest
requires a high concentration of purified DNA
uses molecular scissors to physically cut existing DNA
main components are restriction enzymes and a stable heat incubator
depend on the presence of specific recognition sequences (sites)
Use when we want to cut out a gene or insert from a circular plasmid to move to another vector/we want to check a piece of DNA (Diagnostic Digest)/when we know the restriction sites already exist
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
To ensure DNA fragments are ready for Gibson Assembly, we have to focus on the end of the sequences since Gibson uses overlapping DNA sequences, not the ‘sticky ends’from restriction enzymes.
Check for overlapping ends - each fragment must share an identical sequence with the fragment next to it.
Verify Clean Ends - since Gibson relies on Exonuclease, we must ensure there are no extra A overhangs or that enzymes have reached complete digestion.
Check the chemical environment - must remove polymerase, dNTPs, and salts from the PCR reaction
Sequence accuracy - verify the final assembled plasmid with the Sanger Sequencing kit at the junction points
How does the plasmid DNA enter the E. coli cells during transformation?
The process is called Transformation.
Steps:
Preparation - before DNA enters, the cells are soaked in a solution of calcium chloride, so that Ca+ would neutralize the negative charge of the DNA and the cell membrane, allowing them to get close to each other.
Entry point = SHOCK - once mixed, the cells with DNA are moved to a 42 C water bath for 30-60 seconds to create a temporary “pressure difference” and physical holes in the cell membrane. Plasmid DNA is sweeped into these pores
Recovery - put the cells back on ice to seal holes.
Describe another assembly method in detail (such as Golden Gate Assembly)
Golden Gate Assembly is a molecular cloning method that allows for the simultaneous, “one-pot” assembly of multiple DNA fragments using Type IIS restriction enzymes and T4 DNA ligase. Unlike standard enzymes, Type IIS enzymes (like BsaI) cut outside of their recognition sites, creating unique 4-base overhangs that can be customized to dictate the assembly order. Because the recognition sites are placed at the very ends of the fragments and are “cut off” during the reaction, the final product is seamless and lacks the original restriction sites, preventing the enzyme from re-cutting the finished plasmid. This “scarless” assembly is highly efficient, often reaching nearly 100% accuracy even when joining ten or more fragments at once. The entire process occurs in a single tube through a series of temperature cycles that alternate between the optimal conditions for digestion and ligation.