<Olga Mineyeva> — HTGAA Spring 2026

cover image cover image

About me

I previously did some studies in neurophysiology addressing neural stem-progenitor cell lifecycle, traumatic brain injury, neuromodulation, and behavioural flexibility in rodent models. I am now interested in neurodevelopmental and neurodegenerative disorders, optimizing disease signature extraction in complex tissue-culture models, culture platform optimization, workflow automation.

Contact info

Homework

Labs

Projects

Subsections of <Olga Mineyeva> — HTGAA Spring 2026

Homework

Weekly homework submissions:

  • Week 1 HW: Principles and Practices

    Class Assignment Describe a biological engineering application or tool you want to develop and why. The project aims to develop a tool to promote Parkinson’s disease phenotype manifestation in human brain organoids by controllable induction of alpha-synuclein protein expression in dopaminergic neurons. The tool is a genetic construct containing switches and regulators to produce alpha-synuclein beyond normal levels in a subpopulation of cells in patient-derived brain organoids for the investigation of patient-specific pathogenic mechanisms, pathways, and phenotypes.

  • Week 2 HW: DNA Read, Write, and Edit

    Part 1: Benchling & In-silico Gel Art This virtual digest of phage Lambda DNA was performed in Benchling. The enzymes used to process the DNA are listed below each column. Part 3: DNA Design Challenge I chose to practice designing a fluorescent-tagged human tyrosine hydroxylase (TH), relevant for my project on Parkinson’s disease. Tyrosine Hydroxylase converts tyrosine to dopamine and is an essential marker for my target cell population, dopaminergic neurons. Although in real applications, GFP under the TH promoter is used to trace dopaminergic neurons, and GFP fused to large (~56 kDa) TH can disrupt tetramerization, enzymatic activity, and folding, I chose to design a TH-GFP construct for training purposes.

  • Week 3 HW: Lab Automation

    Part 1: Opentrons Artwork This design was generated using the GUI at opentrons-art.rcdonovan.com and can be accessed through https://opentrons-art.rcdonovan.com/?id=1s7h4g7m1kn174o Coordinates mrfp1_points = [(27.5, 25.3),(25.3, 23.1),(23.1, 18.7),(20.9, 16.5),(18.7, 14.3),(36.3, 5.5),(38.5, 5.5),(27.5, 3.3),(29.7, 3.3),(31.9, 3.3),(34.1, 3.3),(36.3, 3.3),(20.9, 1.1),(23.1, 1.1),(25.3, 1.1),(16.5, -1.1),(18.7, -1.1)] mscarlet_i_points = [(29.7, 25.3),(27.5, 23.1),(29.7, 23.1),(25.3, 20.9),(27.5, 20.9),(25.3, 18.7),(23.1, 16.5),(20.9, 14.3),(16.5, 12.1),(18.7, 12.1),(16.5, 9.9),(14.3, 7.7),(38.5, 7.7),(12.1, 5.5),(34.1, 5.5),(9.9, 3.3),(38.5, 3.3),(14.3, -1.1)] electra2_points = [(31.9, 23.1),(29.7, 20.9),(31.9, 20.9),(34.1, 20.9),(27.5, 18.7),(29.7, 18.7),(31.9, 18.7),(25.3, 16.5),(27.5, 16.5),(29.7, 16.5),(23.1, 14.3),(25.3, 14.3),(27.5, 14.3),(20.9, 12.1),(23.1, 12.1),(18.7, 9.9),(20.9, 9.9),(16.5, 7.7),(18.7, 7.7),(14.3, 5.5),(16.5, 5.5),(12.1, 3.3),(9.9, 1.1),(9.9, -1.1)] mturquoise2_points = [(34.1, 18.7),(-5.5, 16.5),(31.9, 16.5),(34.1, 16.5),(36.3, 16.5),(29.7, 14.3),(31.9, 14.3),(34.1, 14.3),(25.3, 12.1),(27.5, 12.1),(29.7, 12.1),(23.1, 9.9),(25.3, 9.9),(27.5, 9.9),(20.9, 7.7),(23.1, 7.7),(1.1, 5.5),(18.7, 5.5),(14.3, 3.3),(16.5, 3.3),(12.1, 1.1),(23.1, -14.3),(7.7, -16.5),(9.9, -16.5),(12.1, -16.5),(14.3, -16.5),(-7.7, -18.7),(-5.5, -18.7),(-3.3, -18.7),(-1.1, -18.7),(1.1, -18.7),(3.3, -18.7),(5.5, -18.7),(7.7, -18.7),(9.9, -18.7),(12.1, -18.7),(16.5, -18.7),(-9.9, -20.9),(-7.7, -20.9),(-5.5, -20.9),(-3.3, -20.9),(-1.1, -20.9),(1.1, -20.9),(3.3, -20.9),(5.5, -20.9),(7.7, -20.9),(9.9, -20.9),(12.1, -20.9),(14.3, -20.9),(18.7, -20.9),(-12.1, -23.1),(-9.9, -23.1),(-7.7, -23.1),(-5.5, -23.1),(-3.3, -23.1),(-1.1, -23.1),(1.1, -23.1),(3.3, -23.1),(5.5, -23.1),(7.7, -23.1),(9.9, -23.1),(12.1, -23.1),(14.3, -23.1),(16.5, -23.1),(-14.3, -25.3),(-12.1, -25.3),(-9.9, -25.3),(-7.7, -25.3),(-5.5, -25.3),(-3.3, -25.3),(-1.1, -25.3),(1.1, -25.3),(3.3, -25.3),(5.5, -25.3),(-16.5, -27.5),(-14.3, -27.5),(-12.1, -27.5),(-9.9, -27.5)] azurite_points = [(-5.5, 14.3),(-3.3, 14.3),(-5.5, 12.1),(-1.1, 12.1),(-5.5, 9.9),(-3.3, 9.9),(-1.1, 9.9),(1.1, 9.9),(-7.7, 7.7),(-5.5, 7.7),(-3.3, 7.7),(-1.1, 7.7),(3.3, 7.7),(-7.7, 5.5),(-5.5, 5.5),(-3.3, 5.5),(-1.1, 5.5),(3.3, 5.5),(5.5, 5.5),(-7.7, 3.3),(-5.5, 3.3),(-3.3, 3.3),(-1.1, 3.3),(1.1, 3.3),(5.5, 3.3),(7.7, 3.3),(-7.7, 1.1),(-5.5, 1.1),(-3.3, 1.1),(-1.1, 1.1),(1.1, 1.1),(5.5, 1.1),(7.7, 1.1),(-9.9, -1.1),(-7.7, -1.1),(-5.5, -1.1),(-3.3, -1.1),(-1.1, -1.1),(1.1, -1.1),(3.3, -1.1),(5.5, -1.1),(7.7, -1.1),(-9.9, -3.3),(-7.7, -3.3),(-5.5, -3.3),(-3.3, -3.3),(-1.1, -3.3),(1.1, -3.3),(3.3, -3.3),(5.5, -3.3),(7.7, -3.3),(9.9, -3.3),(12.1, -3.3),(14.3, -3.3),(9.9, -5.5),(12.1, -5.5),(14.3, -5.5),(16.5, -5.5),(-12.1, -7.7),(-9.9, -7.7),(-7.7, -7.7),(-5.5, -7.7),(-3.3, -7.7),(-1.1, -7.7),(1.1, -7.7),(3.3, -7.7),(5.5, -7.7),(7.7, -7.7),(9.9, -7.7),(12.1, -7.7),(14.3, -7.7),(16.5, -7.7),(18.7, -7.7),(-12.1, -9.9),(-9.9, -9.9),(-7.7, -9.9),(-5.5, -9.9),(-3.3, -9.9),(-1.1, -9.9),(1.1, -9.9),(3.3, -9.9),(5.5, -9.9),(7.7, -9.9),(9.9, -9.9),(14.3, -9.9),(16.5, -9.9),(18.7, -9.9),(20.9, -9.9),(-12.1, -12.1),(-9.9, -12.1),(-7.7, -12.1),(-5.5, -12.1),(-3.3, -12.1),(-1.1, -12.1),(1.1, -12.1),(3.3, -12.1),(5.5, -12.1),(7.7, -12.1),(9.9, -12.1),(12.1, -12.1),(14.3, -12.1),(16.5, -12.1),(18.7, -12.1),(20.9, -12.1),(23.1, -12.1),(-12.1, -14.3),(-9.9, -14.3),(-7.7, -14.3),(-5.5, -14.3),(-3.3, -14.3),(-1.1, -14.3),(1.1, -14.3),(3.3, -14.3),(5.5, -14.3),(7.7, -14.3),(9.9, -14.3),(12.1, -14.3),(16.5, -14.3),(18.7, -14.3),(20.9, -14.3),(-12.1, -16.5),(-9.9, -16.5),(-7.7, -16.5),(-5.5, -16.5),(-3.3, -16.5),(-1.1, -16.5),(1.1, -16.5),(3.3, -16.5),(5.5, -16.5),(16.5, -16.5),(18.7, -16.5),(20.9, -16.5),(-12.1, -18.7),(-9.9, -18.7),(14.3, -18.7),(18.7, -18.7),(-14.3, -20.9),(-12.1, -20.9),(16.5, -20.9),(-14.3, -23.1),(-16.5, -25.3)] sfgfp_points = [(36.3, 14.3),(31.9, 12.1),(34.1, 12.1),(36.3, 12.1),(29.7, 9.9),(31.9, 9.9),(25.3, 7.7),(27.5, 7.7),(20.9, 5.5),(23.1, 5.5),(18.7, 3.3),(14.3, 1.1)] venus_points = [(34.1, 9.9),(36.3, 9.9),(29.7, 7.7),(25.3, 5.5),(20.9, 3.3),(16.5, 1.1)] mko2_points = [(-36.3, 12.1),(-38.5, 9.9),(-36.3, 9.9),(-34.1, 9.9),(38.5, 9.9),(-38.5, 7.7),(-36.3, 7.7),(-34.1, 7.7),(-31.9, 7.7),(31.9, 7.7),(34.1, 7.7),(36.3, 7.7),(-34.1, 5.5),(-31.9, 5.5),(-29.7, 5.5),(27.5, 5.5),(29.7, 5.5),(31.9, 5.5),(-29.7, 3.3),(-27.5, 3.3),(-25.3, 3.3),(23.1, 3.3),(25.3, 3.3),(-25.3, 1.1),(-23.1, 1.1),(-20.9, 1.1),(18.7, 1.1),(-20.9, -1.1),(-18.7, -1.1),(-16.5, -1.1),(12.1, -1.1),(-14.3, -3.3)] mjuniper_points = [(-3.3, 12.1),(1.1, 7.7),(3.3, 3.3),(3.3, 1.1),(-9.9, -5.5),(-7.7, -5.5),(-5.5, -5.5),(-3.3, -5.5),(-1.1, -5.5),(1.1, -5.5),(3.3, -5.5),(5.5, -5.5),(7.7, -5.5),(12.1, -9.9),(14.3, -14.3)] Part 2: Post-Lab Questions Part 3: Final Project Ideas Project 1: Tunable Induction of Alpha-Synuclein Expression for Modeling Parkinson’s Disease Aim:

  • Week 4: Protein Design Part I

    Part A. Conceptual Questions Q1: How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons) 1 Da equals 1.66053906892(52)×10−27 kg, so 1 aa is 1.66053906892(52)×10−25 kg. The average protein fraction in meat is ~20%. Therefore, the total amount of protein is 100g. The number of aa in 100g = 0.1kg of protein is ~6×10²³ or Avogadro number, 1 mole.

  • Week 5: Protein design, part II

    Part C: Final Project: L-Protein Mutants (variable sites underlined) Original Sequence Soluble N-terminal domain C-terminal domain METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSST LYVLIFLAIFLS KFTNQLLLSLLEAVIRTVTTLQQLLT Variable sites identified aligning BLAST results in ClustalOmega (8 in the N-terminus and 4 in the transmembrane domain highlighted): METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSST LYVLIFLAIFLS KFTNQLLLSLLEAVIRTVTTLQQLLT Mutated Sequence 1 METRFPRQSQETLESTNRRRPRKHEDYPCRRQQRSST LYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT For this mutant, I modified the N-terminal domain, aiming to stabilize the disordered domain. I introduced as many charged pairs as possible in the variable sites (changed 4 out of 8 in the N-terminal domain), and additionally changed one conserved site on the left side of the 2nd pair. Summary of mutations

Subsections of Homework

Week 1 HW: Principles and Practices

Class Assignment

  1. Describe a biological engineering application or tool you want to develop and why.

The project aims to develop a tool to promote Parkinson’s disease phenotype manifestation in human brain organoids by controllable induction of alpha-synuclein protein expression in dopaminergic neurons. The tool is a genetic construct containing switches and regulators to produce alpha-synuclein beyond normal levels in a subpopulation of cells in patient-derived brain organoids for the investigation of patient-specific pathogenic mechanisms, pathways, and phenotypes.

Patient-derived brain organoids naturally recapitulate some neurodegenerative and neurodevelopmental disease features and are used as models to study human-specific pathology and test potential therapeutics. One of the significant and costly problems of these models is the time needed (months) for organoid maturation and manifestation of pathological phenotypes, involving protein accumulation, mitochondrial dysfunction, and neuronal death. Therefore, approaches to speed up growth, maturation, and phenotype development are currently needed and being devised. In Parkinson’s disease in particular, death of dopaminergic neurons causing movement deficiencies is caused by alpha-synuclein protein misfolding and accumulation, triggered by failures in different interconnected processes (mitochondrial dysfunction, dopamine metabolism, inflammation, autophagy dysfunction) and various mutations (SNCA, LRRK2, PINK1, Parkin, DJ-1, GBA). The tool for premature controllable production of alpha-synuclein will allow standardized promotion of Parkinson’s disease phenotype to be used in studies of individual and patient-specific factors leading to the disease (specific dysfunctions leading to inefficient protein degradation and alpha-synuclein protein accumulation) and devise personalized treatment strategies within platforms working with patient-derived brain organoids/assembloids.

  1. Describe one or more governance/policy goals related to ensuring that this application or tool contributes to an “ethical” future, like ensuring non-malfeasance (preventing harm). Break big goals down into two or more specific sub-goals.

One goal can be related to environmental and resource considerations. The tool may substantially reduce the time and resources for Parkinson’s research, so these benefits need to be documented and promoted. Therefore, systematic quantification, validation, and dissemination of resource efficiency of this method for accelerated Parkinson’s phenotype modeling is needed to promote adoption of sustainable research practices and resource allocation decisions in neurodegenerative research and drug discovery.

  1. Describe at least three different potential governance “actions” by considering the four aspects below (Purpose, Design, Assumptions, Risks of Failure & “Success”). Try to outline a mix of actions (e.g. a new requirement/rule, incentive, or technical strategy) pursued by different “actors” (e.g. academic researchers, companies, federal regulators, law enforcement, etc). Draw upon your existing knowledge and a little additional digging, and feel free to use analogies to other domains (e.g., 3D printing, drones, financial systems, etc.).

Action 1: Conduct and share the analysis of the process

Purpose: measure costs and environmental footprint of the tool versus old approaches; the results need to be published. 

Design: data needs to be collected by a lab; environmental specialists, including institutional ones, need to be involved to share their expertise on the methodology and the analysis, as well as consider both relative and absolute environmental impact; additional funding can be requested from the Parkinson’s Foundation.

What could be wrong, incorrect assumptions: The environmental and resource benefits may not be sufficient or comparable to old protocols like fibril seeding; there could be hidden costs for monitoring or equipment that cancel out the saved time; when these organoids are produced at scale, the dynamics and the resources needed  may be different from those for lab scales.

Risks: initial monitoring, as well as monitoring in the labs that acquire this method, may require considerable investments that can be hard to acquire, and therefore, this tool may lack monitoring, be less preferred; the results of monitoring may show that the method is not saving time or resources but rather is more resource-intensive

Action 2: Develop and promote a framework for resource efficiency reporting

Purpose: create a standardized method for reporting the use of resources that can become a standard for the field and include environmental measures into research workflow. This could be a template or checklist to increase transparency and improve reproducibility across studies and labs.

Design: tool developers need to agree on metrics, sharing the standard with agencies, societies, and databases that could adopt and promote it (SFN, CAN, Stem Cell Research foundations, etc.); software developers can create tools for efficient and easy reporting of data across the tool users

What could be wrong, incorrect assumptions: the reporting burden might be too large for researchers to comply with; standardization may not be possible across different research environments; the chosen metrics fail in reporting real efficiency gains 

Risks: users may refuse to adopt or ignore the standard because it’s too complex, expensive, or not enforced

Action 3: Develop an open-access resource optimization database and tools

Purpose: create a community-maintained platform where researchers working with organoids and the developed tool share their protocols, techniques that save resources, troubleshooting, quality control, and cost-benefit analysis.

Design: this database can be part of an already existing open-access platform for organoid research, such as the one maintained by the Early Drug Discovery Unit at McGill University. Tool developers need to develop the initial database and report their content there. Contributing researchers need to volunteer time to share their data and protocols; the university needs to approve sharing potentially patentable information; moderators are required to guide and encourage participation. 

What could be wrong, incorrect assumptions: publications can be sufficient to report protocols and sustainability gains, and no new resource is needed; efficiency improvements in different contexts may not translate to universal strategies.

Risks: the resource can become outdated and not used; companies might take advantage to promote their reagents; funding may not be sufficient; researchers may choose not to share the most valuable information; bad protocols might be propagated. 
  1. Score (from 1-3 with, 1 as the best, or n/a) each of your governance actions against your rubric of policy goals. The following is one framework but feel free to make your own:
Does the option:Option 1Option 2Option 3
Enhance Biosecurity
• By preventing incidents332
• By helping respond332
Foster Lab Safety
• By preventing incident321
• By helping respond331
Protect the environment
• By preventing incidents213
• By helping respond321
Other considerations
• Minimizing costs and burdens to stakeholders231
• Feasibility?132
• Not impede research132
• Promote constructive applications213
  1. Drawing upon this scoring, describe which governance option, or combination of options, you would prioritize, and why. Outline any trade-offs you considered as well as assumptions and uncertainties.

Among the proposed 3, Action 1, i.e., the monitoring, needs to be prioritized because it will generate and assess evidence for the tool and for whether the other two actions are needed. The data obtained through Action 1 will provide the base for protocol optimization. Action 1 also mainly depends on the tool developers rather than on other contributors, and so it’s easier to lead. The main risk of this action is choosing the wrong metrics. Also, the efficiency of the tool in terms of the environment may not be accepted as a priority by the community of researchers in the field (Parkinson’s or neurodegenerative disorders). Also, if in fact, there is no environmental benefit of the tool, some lessons on the disease-accelerating approach, metrics approach, and costs, need to nevertheless be learned and published. An uncertainty is what agencies would be willing to fund this.

  1. Reflecting on what you learned and did in class this week, outline any ethical concerns that arose, especially any that were new to you. Then propose any governance actions you think might be appropriate to address those issues. This should be included on your class page for this week.

I’ve learned about governance for synthetic biology projects in general and about the boundary between technical practice in science and governance. Regarding the latter, the design of controls is part of scientific methodology. But control design can actually become governance if new and specific standards for controls are disseminated, when control requirements are institutionalized, standardized control frameworks are proposed or built onto a tool, minimal control standards are created, or when expectations about reporting controls are established to foster transparency. This is applicable to my project on accelerated disease phenotype manifestation as well, and some related governance actions can be developed on how to ensure proper controls are used. These actions can include 1) establishing minimum control standards both for within-organoid controls (the organoids are intended to be chimeras) and parallel controls with natural manifestation (normally aging organoids); 2) creating a standardized control protocol repository; 3) providing training on the protocols and certifications; 4) other (to be added).

Assignment (Week 2 Lecture Prep)

Homework Questions from Professor Jacobson:

  1. Nature’s machinery for copying DNA is called polymerase. What is the error rate of polymerase? How does this compare to the length of the human genome. How does biology deal with that discrepancy?

The error rate of DNA polymerase depends on the organism and on the type of polymerase. The error rate of replicative human polymerases δ and ε before proofreading with their endonuclease activity is estimated as about one wrong nucleotide in every hundred thousand to million nucleotides added. Compared to diploid human genome 6 billion base pairs long, this error rate means that replication mediated by polymerase alone will produce about 610-910-5 = ~60,000 errors per genome with 10-5 rate, which is why proofreading is essential. To deal with that discrepancy, errors are corrected through immediate proofreading, reducing mistakes to about 1-2 orders and through mismatch repair reducing mistakes to about another 1-2 orders, eventually reducing the mistakes to ~10⁻9–10⁻10 rate. To immediately correct the mistakes of Pol ε, Pols δ and ε conduct proofreading themselves by shifting the DNA strand to the exonuclease site. Right after replication, post-replication errors are corrected in the process called mismatch repair correcting patches of mismatching pairs and involving proteins recognising a mistake on a new strand, endonucleases and free exonucleases to remove the mismatch, the polymerase delta to fill the gap with the correct sequence, and ligase to seal the strand.

  1. How many different ways are there to code (DNA nucleotide code) for an average human protein? In practice what are some of the reasons that all of these different codes don’t work to code for the protein of interest?

The number of ways to code an average human protein is equal to the average number of codons per amino acid (~3) to the power of the average protein size (~400 amino acids) = 3400. Some of the reasons all these different codes don’t work and codons are not equally likely include: sequence proofreading correcting mistakes in codons; evolution leading to codon selection removing sequences that caused problems in mRNA folding, splicing, were slow in translation and affect protein folding making the protein non-functional; the polymerase being prone to some errors but not others; certain mutations accruing non-randomly, with pyrimidine-purine less likely than purine-purine or pyrimidine- pyrimidine.

Homework Questions from Professor Proust:

  1. What’s the most commonly used method for oligo synthesis currently?

Phosphoramidite synthesis is the most common.

  1. Why is it difficult to make oligos longer than 200nt via direct synthesis?

Its difficult to synthesise oligos longer than 200 nucleotides because phosphoramidite synthesis efficiency decreases with length due to errors accumulation (as more synthesis steps are needed and more opportunities for mistakes arise, nucleotides may fail to attach, chemical errors upon nucleotide attachment may occur, and inefficient capping may allow for next synthesis cycles while truncated sequences are produced that compromise purification), enzymatic synthesis capable of making longer oligos expensive, less standardized, and still under development, and microarray-based synthesis produces shorter sequences.

  1. Why can’t you make a 2000bp gene via direct oligo synthesis?

Such a long synthesis will mainly produce incorrect sequences and a tiny fraction of correct ones due to the accumulation of errors, and extracting the correct ones will be expensive, time-consuming, impossible, and overall impractical.

Homework Questions from Professor Church:

  1. [Using Google & Prof. Church’s slide #4] What are the 10 essential amino acids in all animals and how does this affect your view of the “Lysine Contingency”?

The 10 essential amino acids in all animals are: Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, Valine, and Arginine. ‘Lysine Contingency’ was intended as a method to make dinosaurs unable to synthesize Lysine, which makes no sense, as lysine is not produced by animals, regardless of the species they used for cloning.

Week 2 HW: DNA Read, Write, and Edit

Part 1: Benchling & In-silico Gel Art

virtual digest virtual digest

This virtual digest of phage Lambda DNA was performed in Benchling. The enzymes used to process the DNA are listed below each column.

Part 3: DNA Design Challenge

I chose to practice designing a fluorescent-tagged human tyrosine hydroxylase (TH), relevant for my project on Parkinson’s disease. Tyrosine Hydroxylase converts tyrosine to dopamine and is an essential marker for my target cell population, dopaminergic neurons. Although in real applications, GFP under the TH promoter is used to trace dopaminergic neurons, and GFP fused to large (~56 kDa) TH can disrupt tetramerization, enzymatic activity, and folding, I chose to design a TH-GFP construct for training purposes.

3.1. Choose your protein.

For TH, UniProt P07101 (Tyrosine 3-monooxygenase, Tyrosine 3-hydroxylase (TH)), I chose TH isoform 1 of 528 amino acids as it’s the canonical and most common isoform in the brain.

UniProt P07101-1, 528 aa: MPTPDATTPQAKGFRRAVSELDAKQAEAIMVRGQGAPGPSLTGSPWPGTAAPAASYTPTPRSPRFIGRRQSLIEDARKEREAAVAAAAAAVPSEPGDPLEAVAFEEKEGKAVLNLLFSPRATKPSALSRAVKVFETFEAKIHHLETRPAQRPRAGGPHLEYFVRLEVRRGDLAALLSGVRQVSEDVRSPAGPKVPWFPRKVSELDKCHHLVTKFDPDLDLDHPGFSDQVYRQRRKLIAEIAFQYRHGDPIPRVEYTAEEIATWKEVYTTLKGLYATHACGEHLEAFALLERFSGYREDNIPQLEDVSRFLKERTGFQLRPVAGLLSARDFLASLAFRVFQCTQYIRHASSPMHSPEPDCCHELLGHVPMLADRTFAQFSQDIGLASLGASDEEIEKLSTLYWFTVEFGLCKQNGEVKAYGAGLLSSYGELLHCLSEEPEIRAFDPEAAAVQPYQDQTYQSVYFVSESFSDAKDKLRSYASRIQRPFSVKFDPYTLAIDVLDSPQAVRRSLEGVQDELDTLAHALSAIG

For GFP, I chose EGFP variant GenBank AAB02572 of 239 aa, because of its higher-intensity emission and optimization for 37°C.

GenBank: AAB02572.1 239 aa: MVSKGEELFT GVVPILVELD GDVNGHKFSV SGEGEGDATY GKLTLKFICT TGKLPVPWPT LVTTLTYGVQ CFSRYPDHMK QHDFFKSAMP EGYVQERTIF FKDDGNYKTR AEVKFEGDTL VNRIELKGID FKEDGNILGH KLEYNYNSHN VYIMADKQKN GIKVNFKIRH NIEDGSVQLA DHYQQNTPIG DGPVLLPDNH YLSTQSALSK DPNEKRDHMV LLEFVTAAGI TLGMDELYK

As a linker, I chose a flexible (GGGGS)3 linker of 15 aa, as this type of linker is used in design in recombinant fusion proteins to increase spatial separation between domains, which could be useful for fusing GFP with a large protein. Since both N and C domains of TH are functional, with the N-terminal domain containing a phosphorylation site needed for the enzyme activation and the C-terminal domain allowing tetramerization, and the fused domain functionality can only be tested empirically, I chose to add a flexible linker to the C-terminal domain, as this linker can allow some spatial freedom for tetramerization.

Linker sequence: GGGGSGGGGSGGGGS

The full sequence (TH-linker-GFP): MPTPDATTPQAKGFRRAVSELDAKQAEAIMVRGQGAPGPSLTGSPWPGTAAPAASYTPTPRSPRFIGRRQSLIEDARKEREAAVAAAAAAVPSEPGDPLEAVAFEEKEGKAVLNLLFSPRATKPSALSRAVKVFETFEAKIHHLETRPAQRPRAGGPHLEYFVRLEVRRGDLAALLSGVRQVSEDVRSPAGPKVPWFPRKVSELDKCHHLVTKFDPDLDLDHPGFSDQVYRQRRKLIAEIAFQYRHGDPIPRVEYTAEEIATWKEVYTTLKGLYATHACGEHLEAFALLERFSGYREDNIPQLEDVSRFLKERTGFQLRPVAGLLSARDFLASLAFRVFQCTQYIRHASSPMHSPEPDCCHELLGHVPMLADRTFAQFSQDIGLASLGASDEEIEKLSTLYWFTVEFGLCKQNGEVKAYGAGLLSSYGELLHCLSEEPEIRAFDPEAAAVQPYQDQTYQSVYFVSESFSDAKDKLRSYASRIQRPFSVKFDPYTLAIDVLDSPQAVRRSLEGVQDELDTLAHALSAIG GGGGSGGGGSGGGGS MVSKGEELFT GVVPILVELD GDVNGHKFSV SGEGEGDATY GKLTLKFICT TGKLPVPWPT LVTTLTYGVQ CFSRYPDHMK QHDFFKSAMP EGYVQERTIF FKDDGNYKTR AEVKFEGDTL VNRIELKGID FKEDGNILGH KLEYNYNSHN VYIMADKQKN GIKVNFKIRH NIEDGSVQLA DHYQQNTPIG DGPVLLPDNH YLSTQSALSK DPNEKRDHMV LLEFVTAAGI TLGMDELYK

3.2. Reverse Translate: Protein (amino acid) sequence to DNA (nucleotide) sequence.

atgccgaccccggatgcgaccaccccgcaggcgaaaggctttcgccgcgcggtgagcgaa ctggatgcgaaacaggcggaagcgattatggtgcgcggccagggcgcgccgggcccgagc ctgaccggcagcccgtggccgggcaccgcggcgccggcggcgagctataccccgaccccg cgcagcccgcgctttattggccgccgccagagcctgattgaagatgcgcgcaaagaacgc gaagcggcggtggcggcggcggcggcggcggtgccgagcgaaccgggcgatccgctggaa gcggtggcgtttgaagaaaaagaaggcaaagcggtgctgaacctgctgtttagcccgcgc gcgaccaaaccgagcgcgctgagccgcgcggtgaaagtgtttgaaacctttgaagcgaaa attcatcatctggaaacccgcccggcgcagcgcccgcgcgcgggcggcccgcatctggaa tattttgtgcgcctggaagtgcgccgcggcgatctggcggcgctgctgagcggcgtgcgc caggtgagcgaagatgtgcgcagcccggcgggcccgaaagtgccgtggtttccgcgcaaa gtgagcgaactggataaatgccatcatctggtgaccaaatttgatccggatctggatctg gatcatccgggctttagcgatcaggtgtatcgccagcgccgcaaactgattgcggaaatt gcgtttcagtatcgccatggcgatccgattccgcgcgtggaatataccgcggaagaaatt gcgacctggaaagaagtgtataccaccctgaaaggcctgtatgcgacccatgcgtgcggc gaacatctggaagcgtttgcgctgctggaacgctttagcggctatcgcgaagataacatt ccgcagctggaagatgtgagccgctttctgaaagaacgcaccggctttcagctgcgcccg gtggcgggcctgctgagcgcgcgcgattttctggcgagcctggcgtttcgcgtgtttcag tgcacccagtatattcgccatgcgagcagcccgatgcatagcccggaaccggattgctgc catgaactgctgggccatgtgccgatgctggcggatcgcacctttgcgcagtttagccag gatattggcctggcgagcctgggcgcgagcgatgaagaaattgaaaaactgagcaccctg tattggtttaccgtggaatttggcctgtgcaaacagaacggcgaagtgaaagcgtatggc gcgggcctgctgagcagctatggcgaactgctgcattgcctgagcgaagaaccggaaatt cgcgcgtttgatccggaagcggcggcggtgcagccgtatcaggatcagacctatcagagc gtgtattttgtgagcgaaagctttagcgatgcgaaagataaactgcgcagctatgcgagc cgcattcagcgcccgtttagcgtgaaatttgatccgtataccctggcgattgatgtgctg gatagcccgcaggcggtgcgccgcagcctggaaggcgtgcaggatgaactggataccctg gcgcatgcgctgagcgcgattggcggcggcggcggcagcggcggcggcggcagcggcggc ggcggcagcatggtgagcaaaggcgaagaactgtttaccggcgtggtgccgattctggtg gaactggatggcgatgtgaacggccataaatttagcgtgagcggcgaaggcgaaggcgat gcgacctatggcaaactgaccctgaaatttatttgcaccaccggcaaactgccggtgccg tggccgaccctggtgaccaccctgacctatggcgtgcagtgctttagccgctatccggat catatgaaacagcatgatttttttaaaagcgcgatgccggaaggctatgtgcaggaacgc accattttttttaaagatgatggcaactataaaacccgcgcggaagtgaaatttgaaggc gataccctggtgaaccgcattgaactgaaaggcattgattttaaagaagatggcaacatt ctgggccataaactggaatataactataacagccataacgtgtatattatggcggataaa cagaaaaacggcattaaagtgaactttaaaattcgccataacattgaagatggcagcgtg cagctggcggatcattatcagcagaacaccccgattggcgatggcccggtgctgctgccg gataaccattatctgagcacccagagcgcgctgagcaaagatccgaacgaaaaacgcgat catatggtgctgctggaatttgtgaccgcggcgggcattaccctgggcatggatgaactg tataaataa

3.3. Codon optimization

Codon optimization is needed to adapt the DNA sequence to the different frequencies of tRNAs in the target organism in which expression is desired. It’s necessary because the same amino acid can be encoded by multiple synonymous codons, while different organisms have different preferences for which codons they use most frequently. By choosing codons that match the most abundant tRNAs in that organism, translation becomes more efficient.

HT-GFP sequence optimised for E.coli:

ATGCCTACGCCCGACGCAACAACGCCTCAAGCAAAAGGCTTTCGCCGGGCGGTTTCCGAACTGGATGCCAAGCAGGCGGAAGCCATTATGGTTCGTGGACAAGGCGCACCGGGTCCCAGCCTTACGGGTAGCCCTTGGCCGGGTACTGCGGCACCTGCTGCTAGCTACACGCCTACTCCTCGCTCACCCCGTTTTATAGGACGTCGTCAATCTCTCATAGAAGATGCTCGCAAAGAACGCGAAGCAGCAGTTGCAGCAGCGGCAGCAGCGGTACCTTCCGAGCCCGGAGACCCTTTAGAGGCTGTTGCATTTGAGGAAAAAGAAGGTAAAGCAGTTCTGAATTTGCTTTTCTCTCCTCGTGCGACAAAACCTTCGGCACTGTCACGGGCTGTCAAGGTTTTCGAAACTTTCGAAGCTAAAATTCACCATTTAGAAACACGACCGGCGCAGCGTCCGCGTGCCGGGGGGCCTCACTTGGAGTACTTCGTGCGTCTGGAGGTTCGACGTGGCGACCTTGCTGCTCTGTTGAGCGGTGTGCGCCAGGTTTCCGAAGATGTTCGTAGTCCTGCCGGACCTAAAGTACCATGGTTTCCGCGCAAAGTTTCCGAATTGGATAAGTGTCATCATCTTGTGACGAAATTTGATCCGGATCTTGACCTCGACCATCCGGGGTTCTCTGATCAGGTGTATCGTCAGCGTCGCAAACTCATTGCAGAGATTGCTTTTCAATATCGCCATGGCGACCCGATTCCCCGCGTAGAGTATACCGCTGAAGAAATAGCTACTTGGAAAGAAGTGTACACAACCCTGAAGGGCTTATATGCTACACACGCGTGTGGCGAACATTTAGAAGCCTTTGCTCTTCTCGAACGTTTCTCAGGTTATAGAGAGGACAACATTCCACAGTTAGAGGACGTTTCCCGATTTCTCAAAGAACGTACCGGCTTTCAGCTGAGACCCGTGGCCGGTTTATTGTCTGCTCGTGATTTCCTGGCATCACTGGCCTTTAGAGTATTCCAGTGTACTCAGTATATTCGCCATGCTTCCTCGCCAATGCACTCACCCGAACCAGATTGTTGCCATGAGTTACTTGGACATGTACCAATGCTCGCAGATCGAACATTTGCGCAATTCTCTCAAGATATCGGCCTGGCTAGTTTAGGCGCTTCAGATGAAGAAATTGAAAAGCTGTCCACACTGTACTGGTTCACCGTAGAATTTGGACTGTGCAAACAGAATGGCGAGGTTAAAGCGTACGGTGCCGGGCTTCTGTCCAGCTATGGTGAATTACTGCACTGTCTGTCAGAGGAGCCGGAGATTCGCGCATTTGATCCTGAAGCAGCCGCCGTCCAGCCATATCAAGATCAGACGTACCAGTCTGTGTATTTTGTTTCCGAAAGCTTTTCAGATGCCAAGGATAAGTTGCGCTCTTACGCTTCACGTATCCAACGCCCGTTTTCTGTAAAGTTCGACCCGTATACGCTGGCCATTGACGTCCTGGATAGCCCACAGGCAGTGCGCAGAAGTCTTGAAGGGGTTCAAGATGAGCTCGATACACTTGCCCATGCCCTTTCCGCTATAGGCGGGGGTGGTGGCTCTGGCGGTGGAGGTAGTGGAGGGGGTGGGAGCATGGTTTCAAAAGGGGAGGAGTTGTTTACTGGCGTGGTCCCAATCCTGGTAGAGTTAGACGGAGATGTTAACGGGCACAAATTCAGCGTTAGTGGTGAAGGGGAAGGCGACGCTACATATGGTAAACTGACACTGAAATTTATTTGTACCACCGGTAAGCTCCCAGTGCCCTGGCCGACTTTGGTTACCACGTTGACATATGGTGTACAATGTTTCTCCCGCTATCCTGACCACATGAAACAACATGATTTTTTCAAATCTGCTATGCCGGAAGGATATGTACAGGAACGTACGATCTTCTTCAAAGATGATGGCAACTATAAAACACGTGCCGAGGTTAAATTTGAGGGTGATACGCTGGTGAATCGCATTGAGTTAAAAGGAATAGACTTTAAGGAGGATGGGAATATTCTTGGCCACAAACTGGAGTACAATTACAATTCTCATAATGTGTATATCATGGCTGATAAACAGAAAAATGGTATCAAGGTTAACTTCAAAATCCGTCATAATATCGAGGATGGTTCTGTTCAGCTTGCTGATCATTATCAGCAAAATACGCCAATCGGTGATGGACCAGTCCTGTTGCCTGATAATCATTACCTCTCTACACAGTCAGCGCTGTCCAAAGACCCAAATGAGAAACGAGATCATATGGTATTGCTGGAATTCGTTACCGCTGCCGGAATTACACTTGGCATGGATGAATTATACAAATAA

3.4. You have a sequence! Now what?

Protein synthesis in cells. A double-stranded DNA first needs to be resolved by helicases to obtain a single strand accessible for RNA polymerase. RNA polymerase binds to a promoter on a 3’ to 5’ ‘template’ strand and produces a 5 to 3 strand of mRNA. In eukaryotic cells, mRNA gets modified right during transcription and just after that. It gets stabilised, protected from endonucleases, prepared for export to the cytoplasm, recognition by ribosomes, and translation initiation, i.e., a cap (a triphosphate link and a modified G) is added on its 5’ end, mRNA is spliced, and a poly-A tail sequence is added on its 3’ end. Mature mRNA then leaves the nucleus for translation in the cytoplasm. mRNA is read in 5’ to 3’ direction, in codons, within a ribosome, a protein-RNA complex or ribozyme, with a catalytic domain in ribosomal RNA. In a ribosome, protein synthesis is a series of processes (initiation, elongation, and termination) with ribosomal proteins acting as initiation factors, terminating factors, and mediating mRNA and tRNA meeting and release between ribosomal subunits. tRNAs recognise codons with their anti-codon and carry the corresponding amino acid, which binds with the previous one with the catalytic activity of the ribosome. If a protein is produced in bacteria, no preprocessing of mRNA occurs (no splicing like in eukaryotes as bacteria lack introns, but there is 5’end protection and addition of poly-A tail for degradation), and translation is coupled with transcription within the cytoplasm, in ribosomes that are smaller than eukaryotic (70S vs 80S). After synthesis, proteins undergo modifications, which are much more complex in eukaryotic cells than in bacteria. Post-translational modifications in eucaryotes occur in membrane organelles, endoplasmic reticulum, and Golgi apparatus, and include a variety of modifications (including phosphorylation, acetylation, formation of disulfide bonds, glycosylation, ubiquitination, most well-studied). In bacteria, these modifications are much simpler and happen in the cytoplasm.

Cell-free protein synthesis. A cell extract (eucaryotic or bacterial) or a PURE system is added to a DNA, and the extract/mix contains all enzymes and factors needed for transcription and translation, including energy sources and helicases.

3.5. How does it work in nature/biological systems?

  1. Describe how a single gene codes for multiple proteins at the transcriptional level.

During transcription in eukaryotes, RNA (pre-mRNA) gets spliced, and introns, which are non-coding sequences, are cut out while exons are joined. mRNA goes through either regular or alternative splicing in a spliceosome. In regular splicing, all introns are removed, and all exons are joined together, while in alternative splicing, some exons are removed and some are included. Either one or another splicing occurs, depending on the gene, cell type, and conditions (including developmental stage), instructing specific regulatory proteins mediating splicing. There are variations in how alternative splicing occurs (either an exon is skipped, an intron is left in the sequence, or a specific site is selected in the sequence). The splicing process is regulated by proteins that bind to pre-mRNA sites near exons and introns and either enhance or repress splicing (their relative concentration can also be a regulating factor, not just absence or presence). These proteins regulate the assembly of spliceosomes (complexes of specialized proteins and small RNA) on pre-mRNA and instruct the sites of spliceosome assembly and the efficiency of its binding. A spliceosome cuts out and joins fragments of pre-mRNA. The process overall allows for a variety of proteins that are synthesized from a single gene.

During transcription in bacteria, almost no splicing occurs, but still, different proteins can be produced from a single gene. RNA polymerase can be directed to different sequences related to a single gene by transcription factors, and so different transcripts are produced. Also, a ribosome can shift a reading frame due to specific patterns in mRNA sequence.

  1. Try aligning the DNA sequence, the transcribed RNA, and also the resulting translated Protein!!! See example below.
HT-GFP DNA, rNA, Protein aligned HT-GFP DNA, rNA, Protein aligned

Rearranged snapshot of TH-GFP protein information flow from DNA to RNA to Protein. Captured from my Benchling and stitched together in a ppt.

4.1. Create a Twist account and a Benchling account

4.2. Build Your DNA Insert Sequence

A sequence for expression in E.coli:

Promoter (e.g. BBa_J23106): TTTACGGCTAGCTCAGTCCTAGGTATAGTGCTAGC RBS (e.g. BBa_B0034 with spacers for optimal expression): CATTAAAGAGGAGAAAGGTACC Start Codon: ATG Coding Sequence (codon optimized HT_GFP DNA): ATGCCTACGCCCGACGCAACAACGCCTCAAGCAAAAGGCTTTCGCCGGGCGGTTTCCGAACTGGATGCCAAGCAGGCGGAAGCCATTATGGTTCGTGGACAAGGCGCACCGGGTCCCAGCCTTACGGGTAGCCCTTGGCCGGGTACTGCGGCACCTGCTGCTAGCTACACGCCTACTCCTCGCTCACCCCGTTTTATAGGACGTCGTCAATCTCTCATAGAAGATGCTCGCAAAGAACGCGAAGCAGCAGTTGCAGCAGCGGCAGCAGCGGTACCTTCCGAGCCCGGAGACCCTTTAGAGGCTGTTGCATTTGAGGAAAAAGAAGGTAAAGCAGTTCTGAATTTGCTTTTCTCTCCTCGTGCGACAAAACCTTCGGCACTGTCACGGGCTGTCAAGGTTTTCGAAACTTTCGAAGCTAAAATTCACCATTTAGAAACACGACCGGCGCAGCGTCCGCGTGCCGGGGGGCCTCACTTGGAGTACTTCGTGCGTCTGGAGGTTCGACGTGGCGACCTTGCTGCTCTGTTGAGCGGTGTGCGCCAGGTTTCCGAAGATGTTCGTAGTCCTGCCGGACCTAAAGTACCATGGTTTCCGCGCAAAGTTTCCGAATTGGATAAGTGTCATCATCTTGTGACGAAATTTGATCCGGATCTTGACCTCGACCATCCGGGGTTCTCTGATCAGGTGTATCGTCAGCGTCGCAAACTCATTGCAGAGATTGCTTTTCAATATCGCCATGGCGACCCGATTCCCCGCGTAGAGTATACCGCTGAAGAAATAGCTACTTGGAAAGAAGTGTACACAACCCTGAAGGGCTTATATGCTACACACGCGTGTGGCGAACATTTAGAAGCCTTTGCTCTTCTCGAACGTTTCTCAGGTTATAGAGAGGACAACATTCCACAGTTAGAGGACGTTTCCCGATTTCTCAAAGAACGTACCGGCTTTCAGCTGAGACCCGTGGCCGGTTTATTGTCTGCTCGTGATTTCCTGGCATCACTGGCCTTTAGAGTATTCCAGTGTACTCAGTATATTCGCCATGCTTCCTCGCCAATGCACTCACCCGAACCAGATTGTTGCCATGAGTTACTTGGACATGTACCAATGCTCGCAGATCGAACATTTGCGCAATTCTCTCAAGATATCGGCCTGGCTAGTTTAGGCGCTTCAGATGAAGAAATTGAAAAGCTGTCCACACTGTACTGGTTCACCGTAGAATTTGGACTGTGCAAACAGAATGGCGAGGTTAAAGCGTACGGTGCCGGGCTTCTGTCCAGCTATGGTGAATTACTGCACTGTCTGTCAGAGGAGCCGGAGATTCGCGCATTTGATCCTGAAGCAGCCGCCGTCCAGCCATATCAAGATCAGACGTACCAGTCTGTGTATTTTGTTTCCGAAAGCTTTTCAGATGCCAAGGATAAGTTGCGCTCTTACGCTTCACGTATCCAACGCCCGTTTTCTGTAAAGTTCGACCCGTATACGCTGGCCATTGACGTCCTGGATAGCCCACAGGCAGTGCGCAGAAGTCTTGAAGGGGTTCAAGATGAGCTCGATACACTTGCCCATGCCCTTTCCGCTATAGGCGGGGGTGGTGGCTCTGGCGGTGGAGGTAGTGGAGGGGGTGGGAGCATGGTTTCAAAAGGGGAGGAGTTGTTTACTGGCGTGGTCCCAATCCTGGTAGAGTTAGACGGAGATGTTAACGGGCACAAATTCAGCGTTAGTGGTGAAGGGGAAGGCGACGCTACATATGGTAAACTGACACTGAAATTTATTTGTACCACCGGTAAGCTCCCAGTGCCCTGGCCGACTTTGGTTACCACGTTGACATATGGTGTACAATGTTTCTCCCGCTATCCTGACCACATGAAACAACATGATTTTTTCAAATCTGCTATGCCGGAAGGATATGTACAGGAACGTACGATCTTCTTCAAAGATGATGGCAACTATAAAACACGTGCCGAGGTTAAATTTGAGGGTGATACGCTGGTGAATCGCATTGAGTTAAAAGGAATAGACTTTAAGGAGGATGGGAATATTCTTGGCCACAAACTGGAGTACAATTACAATTCTCATAATGTGTATATCATGGCTGATAAACAGAAAAATGGTATCAAGGTTAACTTCAAAATCCGTCATAATATCGAGGATGGTTCTGTTCAGCTTGCTGATCATTATCAGCAAAATACGCCAATCGGTGATGGACCAGTCCTGTTGCCTGATAATCATTACCTCTCTACACAGTCAGCGCTGTCCAAAGACCCAAATGAGAAACGAGATCATATGGTATTGCTGGAATTCGTTACCGCTGCCGGAATTACACTTGGCATGGATGAATTATACAAATAA 7x His Tag (Let’s add a 7×His tag at the C-terminus of the protein to enable protein purification from E. coli): CATCACCATCACCATCATCAC Stop Codon: TAA Terminator (e.g. BBa_B0015): CCAGGCATCAAATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGTTGTTTGTCGGTGAACGCTCTCTACTAGAGTCACACTGGCTCACCTTCGGGTGGGCCTTTCTGCGTTTATA

Annotate Insert Annotate Insert

4.6. Choose Your Vector

Vector: pTwist Amp High Copy

TH-GFP_plasmid TH-GFP_plasmid

TH-GFP Plasmid

Part 5: DNA Read/Write/Edit

5.1 DNA Read

  1. What DNA would you want to sequence (e.g., read) and why?

It could be useful to sequence mitochondrial DNA of dopaminergic neurons and other cells of PD patients. As mutant mtDNA accumulates with age, mitochondrial dysfunction is a common phenotype observed in brain tissue, and the link between mtDNA maintenance and neurodegeneration is fairly recognised. Sequencing may reveal patterns of mitochondrial dysfunction and mt genome instability useful for predicting vulnerability to PD, neurodegenerative, and non-neurodegenerative conditions, as well as investigating fundamental aspects of aging in general. Concentrating on mtDNA involves a large patient population. However, to be a practical diagnostic tool (mtDNA biomarker), more accessible mtDNA in CSF or blood, or peripheral mtDNA has to be included, as tissue decline during aging is heterogeneous, and peripheral mtDNA may also contain information on the vulnerability of neuronal mtDNA to PD. It could be useful to aim at sequencing all 37 mitochondrial genes.

  1. In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?

Week 3 HW: Lab Automation

Part 1: Opentrons Artwork

This design was generated using the GUI at opentrons-art.rcdonovan.com and can be accessed through https://opentrons-art.rcdonovan.com/?id=1s7h4g7m1kn174o

Artwork design Artwork design
Coordinatesmrfp1_points = [(27.5, 25.3),(25.3, 23.1),(23.1, 18.7),(20.9, 16.5),(18.7, 14.3),(36.3, 5.5),(38.5, 5.5),(27.5, 3.3),(29.7, 3.3),(31.9, 3.3),(34.1, 3.3),(36.3, 3.3),(20.9, 1.1),(23.1, 1.1),(25.3, 1.1),(16.5, -1.1),(18.7, -1.1)] mscarlet_i_points = [(29.7, 25.3),(27.5, 23.1),(29.7, 23.1),(25.3, 20.9),(27.5, 20.9),(25.3, 18.7),(23.1, 16.5),(20.9, 14.3),(16.5, 12.1),(18.7, 12.1),(16.5, 9.9),(14.3, 7.7),(38.5, 7.7),(12.1, 5.5),(34.1, 5.5),(9.9, 3.3),(38.5, 3.3),(14.3, -1.1)] electra2_points = [(31.9, 23.1),(29.7, 20.9),(31.9, 20.9),(34.1, 20.9),(27.5, 18.7),(29.7, 18.7),(31.9, 18.7),(25.3, 16.5),(27.5, 16.5),(29.7, 16.5),(23.1, 14.3),(25.3, 14.3),(27.5, 14.3),(20.9, 12.1),(23.1, 12.1),(18.7, 9.9),(20.9, 9.9),(16.5, 7.7),(18.7, 7.7),(14.3, 5.5),(16.5, 5.5),(12.1, 3.3),(9.9, 1.1),(9.9, -1.1)] mturquoise2_points = [(34.1, 18.7),(-5.5, 16.5),(31.9, 16.5),(34.1, 16.5),(36.3, 16.5),(29.7, 14.3),(31.9, 14.3),(34.1, 14.3),(25.3, 12.1),(27.5, 12.1),(29.7, 12.1),(23.1, 9.9),(25.3, 9.9),(27.5, 9.9),(20.9, 7.7),(23.1, 7.7),(1.1, 5.5),(18.7, 5.5),(14.3, 3.3),(16.5, 3.3),(12.1, 1.1),(23.1, -14.3),(7.7, -16.5),(9.9, -16.5),(12.1, -16.5),(14.3, -16.5),(-7.7, -18.7),(-5.5, -18.7),(-3.3, -18.7),(-1.1, -18.7),(1.1, -18.7),(3.3, -18.7),(5.5, -18.7),(7.7, -18.7),(9.9, -18.7),(12.1, -18.7),(16.5, -18.7),(-9.9, -20.9),(-7.7, -20.9),(-5.5, -20.9),(-3.3, -20.9),(-1.1, -20.9),(1.1, -20.9),(3.3, -20.9),(5.5, -20.9),(7.7, -20.9),(9.9, -20.9),(12.1, -20.9),(14.3, -20.9),(18.7, -20.9),(-12.1, -23.1),(-9.9, -23.1),(-7.7, -23.1),(-5.5, -23.1),(-3.3, -23.1),(-1.1, -23.1),(1.1, -23.1),(3.3, -23.1),(5.5, -23.1),(7.7, -23.1),(9.9, -23.1),(12.1, -23.1),(14.3, -23.1),(16.5, -23.1),(-14.3, -25.3),(-12.1, -25.3),(-9.9, -25.3),(-7.7, -25.3),(-5.5, -25.3),(-3.3, -25.3),(-1.1, -25.3),(1.1, -25.3),(3.3, -25.3),(5.5, -25.3),(-16.5, -27.5),(-14.3, -27.5),(-12.1, -27.5),(-9.9, -27.5)] azurite_points = [(-5.5, 14.3),(-3.3, 14.3),(-5.5, 12.1),(-1.1, 12.1),(-5.5, 9.9),(-3.3, 9.9),(-1.1, 9.9),(1.1, 9.9),(-7.7, 7.7),(-5.5, 7.7),(-3.3, 7.7),(-1.1, 7.7),(3.3, 7.7),(-7.7, 5.5),(-5.5, 5.5),(-3.3, 5.5),(-1.1, 5.5),(3.3, 5.5),(5.5, 5.5),(-7.7, 3.3),(-5.5, 3.3),(-3.3, 3.3),(-1.1, 3.3),(1.1, 3.3),(5.5, 3.3),(7.7, 3.3),(-7.7, 1.1),(-5.5, 1.1),(-3.3, 1.1),(-1.1, 1.1),(1.1, 1.1),(5.5, 1.1),(7.7, 1.1),(-9.9, -1.1),(-7.7, -1.1),(-5.5, -1.1),(-3.3, -1.1),(-1.1, -1.1),(1.1, -1.1),(3.3, -1.1),(5.5, -1.1),(7.7, -1.1),(-9.9, -3.3),(-7.7, -3.3),(-5.5, -3.3),(-3.3, -3.3),(-1.1, -3.3),(1.1, -3.3),(3.3, -3.3),(5.5, -3.3),(7.7, -3.3),(9.9, -3.3),(12.1, -3.3),(14.3, -3.3),(9.9, -5.5),(12.1, -5.5),(14.3, -5.5),(16.5, -5.5),(-12.1, -7.7),(-9.9, -7.7),(-7.7, -7.7),(-5.5, -7.7),(-3.3, -7.7),(-1.1, -7.7),(1.1, -7.7),(3.3, -7.7),(5.5, -7.7),(7.7, -7.7),(9.9, -7.7),(12.1, -7.7),(14.3, -7.7),(16.5, -7.7),(18.7, -7.7),(-12.1, -9.9),(-9.9, -9.9),(-7.7, -9.9),(-5.5, -9.9),(-3.3, -9.9),(-1.1, -9.9),(1.1, -9.9),(3.3, -9.9),(5.5, -9.9),(7.7, -9.9),(9.9, -9.9),(14.3, -9.9),(16.5, -9.9),(18.7, -9.9),(20.9, -9.9),(-12.1, -12.1),(-9.9, -12.1),(-7.7, -12.1),(-5.5, -12.1),(-3.3, -12.1),(-1.1, -12.1),(1.1, -12.1),(3.3, -12.1),(5.5, -12.1),(7.7, -12.1),(9.9, -12.1),(12.1, -12.1),(14.3, -12.1),(16.5, -12.1),(18.7, -12.1),(20.9, -12.1),(23.1, -12.1),(-12.1, -14.3),(-9.9, -14.3),(-7.7, -14.3),(-5.5, -14.3),(-3.3, -14.3),(-1.1, -14.3),(1.1, -14.3),(3.3, -14.3),(5.5, -14.3),(7.7, -14.3),(9.9, -14.3),(12.1, -14.3),(16.5, -14.3),(18.7, -14.3),(20.9, -14.3),(-12.1, -16.5),(-9.9, -16.5),(-7.7, -16.5),(-5.5, -16.5),(-3.3, -16.5),(-1.1, -16.5),(1.1, -16.5),(3.3, -16.5),(5.5, -16.5),(16.5, -16.5),(18.7, -16.5),(20.9, -16.5),(-12.1, -18.7),(-9.9, -18.7),(14.3, -18.7),(18.7, -18.7),(-14.3, -20.9),(-12.1, -20.9),(16.5, -20.9),(-14.3, -23.1),(-16.5, -25.3)] sfgfp_points = [(36.3, 14.3),(31.9, 12.1),(34.1, 12.1),(36.3, 12.1),(29.7, 9.9),(31.9, 9.9),(25.3, 7.7),(27.5, 7.7),(20.9, 5.5),(23.1, 5.5),(18.7, 3.3),(14.3, 1.1)] venus_points = [(34.1, 9.9),(36.3, 9.9),(29.7, 7.7),(25.3, 5.5),(20.9, 3.3),(16.5, 1.1)] mko2_points = [(-36.3, 12.1),(-38.5, 9.9),(-36.3, 9.9),(-34.1, 9.9),(38.5, 9.9),(-38.5, 7.7),(-36.3, 7.7),(-34.1, 7.7),(-31.9, 7.7),(31.9, 7.7),(34.1, 7.7),(36.3, 7.7),(-34.1, 5.5),(-31.9, 5.5),(-29.7, 5.5),(27.5, 5.5),(29.7, 5.5),(31.9, 5.5),(-29.7, 3.3),(-27.5, 3.3),(-25.3, 3.3),(23.1, 3.3),(25.3, 3.3),(-25.3, 1.1),(-23.1, 1.1),(-20.9, 1.1),(18.7, 1.1),(-20.9, -1.1),(-18.7, -1.1),(-16.5, -1.1),(12.1, -1.1),(-14.3, -3.3)] mjuniper_points = [(-3.3, 12.1),(1.1, 7.7),(3.3, 3.3),(3.3, 1.1),(-9.9, -5.5),(-7.7, -5.5),(-5.5, -5.5),(-3.3, -5.5),(-1.1, -5.5),(1.1, -5.5),(3.3, -5.5),(5.5, -5.5),(7.7, -5.5),(12.1, -9.9),(14.3, -14.3)]

Part 2: Post-Lab Questions

Part 3: Final Project Ideas

Project 1: Tunable Induction of Alpha-Synuclein Expression for Modeling Parkinson’s Disease

Aim:

This project aims to develop a tool to promote Parkinson’s disease phenotype manifestation by controllable induction of alpha-synuclein expression in dopaminergic neurons within patient-derived brain organoids.

Background:

Parkinson’s disease (PD) is driven by alpha-synuclein misfolding and accumulation in dopaminergic neurons, triggered by interconnected failures in multiple cellular processes. Current PD models using AAV-mediated alpha-synuclein overexpression and exogenous fibril seeding are effective at replicating key features of sporadic PD but lack controllability, limiting their value for investigating which cellular systems fail under pathological alpha-synuclein load in individual patients.

Patient-derived brain organoids naturally recapitulate human-specific neurodegenerative features, but their use for studying PD is constrained by the months required for PD phenotype manifestation. Tools that accelerate and standardize pathological phenotype induction in human tissue culture models are therefore needed.

Tool Description:

This project aims to develop a tool to promote Parkinson’s disease phenotype manifestation by controllable induction of alpha-synuclein expression in dopaminergic neurons within patient-derived brain organoids. The tool uses a genetic circuit for controllable oscillatory overexpression of alpha-synuclein. The circuit will employ a small molecule-activated sensor-promoter to initiate alpha-synuclein expression, a delayed negative feedback loop with a repressor to generate self-limiting oscillatory expression, and an external OFF switch to terminate the expression.

Significance:

The tool will hopefully:

  • enable standardized and accelerated induction of PD phenotypes and
  • allow probing patient-specific vulnerabilities and
  • allow testing personalized therapeutic strategies in organoid platforms.
Project 2: Sensing α-Synuclein-Driven Mitochondrial Proteostatic Failure in Parkinson’s Disease with an RNA Toehold Switch

Aim:

This project aims to design a sensor for mitochondrial dysfunction in models of Parkinson’s disease (PD). An RNA toehold switch sensor will target mitochondrial protease (ClpP) mRNA, which is expected to rise in Parkinson’s disease models.

Background:

ClpP is a mitochondrial matrix protease that degrades misfolded or damaged proteins and recently shown to be inhibited by α-synuclein (through direct binding at the NAC domain), representing a novel mechanistic link between α-synuclein pathology and mitochondrial proteostatic failure in Parkinson’s disease. When ClpP activity is chronically suppressed by accumulating α-synuclein, the cell is expected to sense the resulting proteostatic stress through the mitochondrial unfolded protein response, which in mammals involves a nuclear transcriptional response involving ATF5-driven upregulation of ClpP as a compensatory mechanism. This creates a scenario where ClpP mRNA levels is expected to rise despite and because of functional ClpP insufficiency at the protein level.

Sensor Description:

An RNA sensor targeting ClpP mRNA would therefore report not on ClpP activity directly, but on the cell’s transcriptional response to its own proteostatic failure, serving as an indirect proxy for the α-synuclein-driven mitochondrial dysfunction that precedes late neurodegeneration and neuronal death.

Significance:

If 1) ATF5-driven ClpP upregulation in dopaminergic neurons is confirmed and 2) the sensor construct is validated in dopaminergic neurons, this sensor could provide an early, mitochondria-specific readout of PD-relevant stress in brain organoid models and become a drug screening tool to identify small molecules that restore the activity of the protease and reduce pathological α-synuclein accumulation (shift the tetrameric:monomeric α-synuclein balance), with the sensor itself as the readout.

Week 4: Protein Design Part I

Part A. Conceptual Questions

Q1: How many molecules of amino acids do you take with a piece of 500 grams of meat? (on average an amino acid is ~100 Daltons)

1 Da equals 1.66053906892(52)×10−27 kg, so 1 aa is 1.66053906892(52)×10−25 kg. The average protein fraction in meat is ~20%. Therefore, the total amount of protein is 100g. The number of aa in 100g = 0.1kg of protein is ~6×10²³ or Avogadro number, 1 mole.

Q2: Why do humans eat beef but do not become a cow, eat fish but do not become fish?

To become another organism, we need to use its DNA to produce proteins within us. However, DNA, RNA, and proteins, we consume are broken down and absorbed, and so only their building blocks and not the information their sequences carry is used in our body.

Q3: Why are there only 20 natural amino acids?

The set of 20 amino acids has been established through evolution, and it would be impossible to change already fixed codons because a change would corrupt thousands of proteins.

Q4: Can you make other non-natural amino acids? Design some new amino acids.

Yes. Some design strategies can include adding an azide to a sidechain for the aa to react in click chemistry reactions or an isotope.

Q5: Where did amino acids come from before enzymes that make them, and before life started?

The production of amino acids does not require enzymes. Enzyme-independent production of amino acids was possible in the early Earth atmosphere, in hydrothermal vents, and through the Strecher Synthesis. Additionally, amino acids reach Earth with meteorite material.

Q6: If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?

The resulting helix will be left-handed.

Q7: Can you discover additional helices in proteins?

Yes. Several other types (4+) of helices have been identified beyond α-helices.

Q8: Why are most molecular helices right-handed?

Most polymer molecules are right-handed because they are comprised from left-handed monomers. In helices, left-handed monomers would occupy more space near the backbone, which is energetically inefficient.

Q9: Why do β-sheets tend to aggregate? What is the driving force for β-sheet aggregation?

They aggregate due to exposed NH group being hydrogen donors and C=O groups being acceptors on the edge of a sheet, so edges form hydrogen bonds (the driving force); due to hydrophobic sides associating with each other; due to steric compatibility of identical sheets; due to matching electrostatic periodicity in sheets; due to low configurational entropy cost of aggregations.

Q10: Why do many amyloid diseases form β-sheets? Can you use amyloid β-sheets as materials?

Yes, β-sheets are used as materials (Kevlar, peptide hydrogels as tissue engineering scaffolds).

Due to stress, mutations, and aging, proteins partially unfold. β-sheets with an alternation of hydrophobic and hydrophilic sites ease aggregation and then the growth of stable aggregates. Sequences with alternating hydrophobic and hydrophilic residues get exposed and form β-sheet conformation due to thermodynamic stability. These sheets bind and stack due to hydrogen bonds and hydrophobic interactions, and the oligomers then grow to exceptionally stable fibers.

Q11: Design a β-sheet motif that forms a well-ordered structure.

TBA

Part B: Protein Analysis and Visualization

1. Protein choice

I chose alpha-synuclein (SNCA) Uniprot P37840 because its toxic form accumulates in Parkinson’s disease, and the mechanisms of alpha-synuclein aggregation are still investigated.

The protein is comprised of 3 domains: an amphipathic N-terminal domain (1-60), a hydrophobic NAC domain (61-95), and an acidic C-terminus (96-140). Three isoforms produced by alternative splicing are: isoform 1 of 140 aa canonical, isoform 2 of 112 aa, and isoform 3 of 126 aa. The protein is normally a monomer, with the highest concentration in the brain and concentrated in the presynaptic terminals of nerve cells. The ’non A-beta component of Alzheimer disease amyloid plaque’ domain (NAC domain) is hydrophobic and is involved in fibril formation. The C-terminus may regulate aggregation and determine the diameter of the filaments.

2. Protein Sequence and Its Analysis

Alpha-synuclein — Sequence
UniProt ID: P37840 | PDB: 1XQ8
MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA

• SNCA is 140 aa long. The most common aa are A (Alanine) and V (Valine).

Amino Acid Frequencies:

V: 19 (13.57%)

A: 19 (13.57%)

G: 18 (12.86%)

E: 18 (12.86%)

K: 15 (10.71%)

T: 10 (7.14%)

D: 6 (4.29%)

Q: 6 (4.29%)

P: 5 (3.57%)

M: 4 (2.86%)

L: 4 (2.86%)

S: 4 (2.86%)

Y: 4 (2.86%)

N: 3 (2.14%)

F: 2 (1.43%)

I: 2 (1.43%)

H: 1 (0.71%)

• 250 homologs were identified with the Uniprot’s BLAST tool

• Alpha-synuclein is a member of the synuclein family, which also includes beta- and gamma-synuclein.

3. Protein Structure

Protein Data Bank PDB Structure Page: micelle-bound human alpha-synuclein, 1XQ8 | pdb_00001xq8 and disease-relevant fiber structure, 6A6B | pdb_00006a6b

The structure of the monomer was solved with NMR and published in 2005. The structure of the fiber was solved with EM, with a resolution of 3.07 Å and published in 2018. No other molecules other than SNCA are present in the structures.

The protein belongs to the Synuclein structure classification family (Structural Classification of Proteins).

4. Visualization

Cartoon visualization of an aggregate (6A6B) and an SNCA monomer (1XQ8) structures Cartoon visualization of an aggregate (6A6B) and an SNCA monomer (1XQ8) structures PyMol Cartoon visualization of an SCNA aggregate (6A6B) EM structure and of an SNCA monomer (1XQ8) NMR structure. Colours by secondary structure.

Cartoon visualization of an aggregate (6A6B) and an SNCA monomer (1XQ8) structures, domains Cartoon visualization of an aggregate (6A6B) and an SNCA monomer (1XQ8) structures, domains PyMol Cartoon visualization of the same structures. Blue (37–60) - N-terminal region, Orange (61–95) - NAC region, Red (96–99)- C-terminal region.

Part C. Using ML-Based Protein Design Tools

I chose human mitochondrial protease ClpP, the protease component of the ClpXP complex that cleaves peptides and various proteins in an ATP-dependent process. ClpP binds the NAC domain of SCNA, and this binding inhibits the protease.

ClpP Protease — Sequence
UniProt ID: Q16740 | PDB: 1TG6
MWPGILVGGARVASCRYPALGPRLAAHFPAQRPPQRTLQNGLALQRCLHATATRALPLIPIVVEQTGRGERAYDIYSRLLRERIVCVMGPIDDSVASLVIAQLLFLQSESNKKPIHMYINSPGGVVTAGLAIYDTMQYILNPICTWCVGQAASMGSLLLAAGTPGMRHSLPNSRIMIHQPSGGARGQATDIAIQAEEIMKLKKQLYNIYAKHTKQSLQVIESAMERDRYMSPMEAQEFGILDKVLVHPPQDGEDEPTLVQKEPVEAAPAAEPVPAST

C1. Protein Language Modeling

1. Deep Mutational Scan

Esm2_t6_8M_UR50D (the shallowest one and using fewer parameters) model was used to generate an unsupervised deep mutational scan of ClpP protein based on language model likelihoods.

clpp_Mutation_Scan clpp_Mutation_Scan

Most mutations are neutral (green/as likely as wildtype). There are several blue regions (121-122, 151,153, and 176) in which all mutations score highly unlikely, so these are structurally and functionally important, not likely to mutate, and these are probably part of active centres of the protease.

The mutation scan mapped the active site neighbourhood. The active sites mentioned in UniPort are TYR 153 and GLU 178. The sites with almost all aa unlikely are: HIS 122, TYR 153, PRO 176, ILE 121, ASN 151.

2. Latent Space Analysis

The provided sequence dataset was used to embed proteins in a reduced dimensionality space. To place ClpP into the resulting latent space map, its ESM2 embedding was first generated. Then, this new embedding was combined with the existing dataset’s embeddings, and the t-SNE algorithm on this combined set was re-run.

latent_space_with_clpp latent_space_with_clpp Esm2_t6_8M_UR50D latent space with embedded human ClpP Q16740 (black).

latent_space_with_clpp_nearest_neighbours latent_space_with_clpp_nearest_neighbours Zoom-in into the Esm2_t6_8M_UR50D latent space with embedded human ClpP Q16740 (black).

The neighbours are all ClpP homologs; therefore, the 6-layer model is correctly identifying sequence homology. Larger models like esm2_t33_650M or esm2_t36_3B are needed to capture more nuanced functional information.

C2. Protein Folding

Folding the original sequence

ClpP_ESM_fold ClpP_ESM_fold ClpP folded with ESMFold.

EM structure of the subunit visualized with Pymol EM structure of the subunit visualized with Pymol EM structure of the ClpP subunit visualized with Pymol (1TG6 | pdb_00001tg6).

EM structure of the ClpP visualized with Pymol EM structure of the ClpP visualized with Pymol EM structure of the ClpP with the subunit highlighted, visualized with Pymol (1TG6 | pdb_00001tg6).

Resilience of the structure to changes in the original sequence

To test the resilience of the predicted fold, I changed 5 positions (121, 122, 151, 153, 176) that are highly unlikely to mutate to Tryptophane.

121 Ser (S)→ Trp (W)

122 Pro (P)→ Trp (W)

151 Ala (A)→ Trp (W)

153 Ser (S)→ Trp (W)

176 Met (M)→ Trp (W)
ClpP Protease — Original Sequence
UniProt ID: Q16740 | PDB: 1TG6
MWPGILVGGARVASCRYPALGPRLAAHFPAQRPPQRTLQNGLALQRCLHATATRALPLIPIVVEQTGRGERAYDIYSRLLRERIVCVMGPIDDSVASLVIAQLLFLQSESNKKPIHMYINSPGGVVTAGLAIYDTMQYILNPICTWCVGQAASMGSLLLAAGTPGMRHSLPNSRIMIHQPSGGARGQATDIAIQAEEIMKLKKQLYNIYAKHTKQSLQVIESAMERDRYMSPMEAQEFGILDKVLVHPPQDGEDEPTLVQKEPVEAAPAAEPVPAST
ClpP Protease — Modified Sequence
MWPGILVGGARVASCRYPALGPRLAAHFPAQRPPQRTLQNGLALQRCLHATATRALPLIPIVVEQTGRGERAYDIYSRLLRERIVCVMGPIDDSVASLVIAQLLFLQSESNKKPIHMYINWWGGVVTAGLAIYDTMQYILNPICTWCVGQWAWMGSLLLAAGTPGMRHSLPNSRIWIHQPSGGARGQATDIAIQAEEIMKLKKQLYNIYAKHTKQSLQVIESAMERDRYMSPMEAQEFGILDKVLVHPPQDGEDEPTLVQKEPVEAAPAAEPVPAST

pymol_preview of the mutated locations pymol_preview of the mutated locations The changed residues (positions 121, 122, 151, 153, 176) highlighted in red within the original subunit structure visualized with PyMol.

The modified sequence was then folded with EMSFold.

Image refold after mutations Image refold after mutations A fold of the modified ClpP sequence generated with ESMFold.

Image Original fold Image Original fold A fold of the original ClpP sequence generated with ESMFold.

C3. Protein Generation

As an input, I used the ClpP subunit isolated previously from (1TG6 | pdb_00001tg6):

EM structure of the subunit visualized with Pymol EM structure of the subunit visualized with Pymol EM structure of the ClpP subunit visualized with Pymol (1TG6 | pdb_00001tg6).

A sequence generated with inverse folding with ProteinMPNN

ClpP Protease — ProteinMPNN-generated Sequence
GAAPVLAXXXXXXXXXATLEEALLAQRVVLVRGPIDAALAAKVVAQLDALEAESPTAPITLLIDSPGGDYDAGLAILDRIRAIPNPVRTWAVGQAASMGALLLASGTPGLRFSTPDARIAIHKVSGTASGSPEELAEQKAALEAKNEELADLLSEYTGQSLETIKEAMKEVNYLTPEEAKEFGLLDHVLAEPP
T=0.1, sample=0, score=0.8587, seq_recovery=0.4402

The generated sequence (with the Xs changed to As) was then folded with EMS fold to compare with the fold predicted from the original sequence and with the published ClpP structure.

subunit_iverse_fold subunit_iverse_fold A fold of the ProteinMPNN-generated ClpP sequence predicted by ESMFold. ESMFold inference for sequence with length 193. ptm: 0.861 plddt: 90.378.

PyMol clp_ESM_fold_compare PyMol clp_ESM_fold_compare The original ClpP subunit structure, Q16740.

clp_ESM_fold clp_ESM_fold A fold of the original ClpP sequence predicted with ESMFold.

Week 5: Protein design, part II

Part C: Final Project: L-Protein Mutants (variable sites underlined)



Original Sequence

Soluble N-terminal domain C-terminal domain METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSST LYVLIFLAIFLS KFTNQLLLSLLEAVIRTVTTLQQLLT

Variable sites identified aligning BLAST results in ClustalOmega (8 in the N-terminus and 4 in the transmembrane domain highlighted):

METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSST LYVLIFLAIFLS KFTNQLLLSLLEAVIRTVTTLQQLLT


Mutated Sequence 1

METRFPRQSQETLESTNRRRPRKHEDYPCRRQQRSST LYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

For this mutant, I modified the N-terminal domain, aiming to stabilize the disordered domain. I introduced as many charged pairs as possible in the variable sites (changed 4 out of 8 in the N-terminal domain), and additionally changed one conserved site on the left side of the 2nd pair.

Summary of mutations

Conserved site changed: 13P->L

Variable sites changed: (7Q->R, 11Q->E, 14A->E, 22F->R)

Pairs introduced by changing the 4 variable sites: Pair 1 (R7–E11), Pair 2 (E14–R18), Pair 3 (R22–D26)

 Soluble N-terminal domain                            C-terminal domain
 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSST  LYVLIFLAIFLS  KFTNQLLLSLLEAVIRTVTTLQQLLT  (Original Sequence)
       R---E LE---    R---                                                        (Mutated Sites)
       V   V CV       V                                                           (Conserved / Variable)
 METRFPRQSQETLESTNRRRPRKHEDYPCRRQQRSST  LYVLIFLAIFLS  KFTNQLLLSLLEAVIRTVTTLQQLLT  (Mutated Sequence 1)


Mutated Sequence 2

METRFPRQSQETLRSTNERRPRKHEDYPCRRQQRSST LYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

For this mutant, I modified the previous sequence (Mutated Sequence 1), aiming to further stabilize the disordered domain. I introduced 1 more mutation to a variable site to invert the second pair.

Summary of mutations

Conserved site changed: 13P->L

Variable sites changed: (7Q->R, 11Q->E, 14A->R, 18A->E, 22R->E)

Pairs introduced by changing the 5 variable sites: Pair 1 (R7–E11), Pair 2 (R14–E18), Pair 3 (R22–D26)

 Soluble N-terminal domain                            C-terminal domain
 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSST  LYVLIFLAIFLS  KFTNQLLLSLLEAVIRTVTTLQQLLT  (Original Sequence)
       R---E LR---E   R---                                                        (Mutated Sites) 
       V   V CV   V   V                                                           (Conserved / Variable)
 METRFPRQSQETLRSTNERRPRKHEDYPCRRQQRSST  LYVLIFLAIFLS  KFTNQLLLSLLEAVIRTVTTLQQLLT  (Mutated Sequence 2)


AlphafoldServer was used to fold the monomers of Mutated Sequence 1 and Mutated Sequence 2. alfafold2_multimer_v2 was used to fold the multimers. alfafold2_multimer_v2 parameters used:

num_relax: 0
template_mode: none
msa_mode: mmseqs2_uniref_env
Pair mode: paired
num_recycles: 3 
recycle_early_stop_tolerance: auto
relax_max_iterations: 200
pairing_strategy: greedy
max_msa: auto
num_seeds: 1



Original Sequence
Multimer

Mutant 1
pLDDT=37.6, pTM-0.189, ipTM = 0.127. 3 pairs/bridges introduced, 1 conserved site changed (13P->L), RRR site kept, (1 conserved and 4 variable sites changed)

Mutant 2
pLDDT=45.8, pTM-0.187, ipTM = 0.126. 3 pairs/bridges introduced, 1 conserved site changed (13P->L), 2nd pair inverted, no RRR site (1 conserved and 5 variable sites changed)

Original Sequence
AlphaFold ipTM = -pTM = 0.44

Mutant 1
AlphaFold ipTM = -pTM = 0.43

Mutant 2
AlphaFold ipTM = - , pTM = 0.44


Mutated Sequence 3

METRFPRQSQETLESTNRRRPRKHEDYPCRRQQRSST LYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

This sequence was designed to explore whether changing the conserved site (13P->L) was required to achieve the same structure as that of the Mutated Sequence 1. For that, the mutated conserved site of the Sequence 1 was changed back to the original (13L->P).

Summary of mutations

Conserved site changed: None

Variable sites changed (as in Mutated Sequence 1): (7Q->R, 11Q->E, 14A->E, 22F->R)

Pairs introduced by changing the 4 variable sites (as in Mutated Sequence 1): Pair 1 (R7–E11), Pair 2 (E14–R18), Pair 3 (R22–D26)

 Soluble N-terminal domain                            C-terminal domain
 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSST  LYVLIFLAIFLS  KFTNQLLLSLLEAVIRTVTTLQQLLT  (Original Sequence)
       R---E LE---    R---                                                        (Mutated Sites)
       V   V CV       V                                                           (Conserved / Variable)
 METRFPRQSQETLESTNRRRPRKHEDYPCRRQQRSST  LYVLIFLAIFLS  KFTNQLLLSLLEAVIRTVTTLQQLLT  (Mutated Sequence 1)
             L                                                                    (Reverted site)
             C                                                                    (Conserved / Variable)     
 METRFPRQSQETLESTNRRRPRKHEDYPCRRQQRSST  LYVLIFLAIFLS  KFTNQLLLSLLEAVIRTVTTLQQLLT  (Mutated Sequence 3)


Mutated Sequence 4

METRFPRQSQETLESTNRRRPRKHEDYPCRRQQRSST LYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

This sequence was designed to explore whether changing the conserved site (13P->L) was required to achieve the helix as in the Mutated Sequence 2. For that, the mutated conserved site of the Sequence 2 was changed back to the original (13L->P).

Summary of mutations

Conserved site changed: None

Variable sites changed (as in Mutated Sequence 1): (7Q->R, 11Q->E, 14A->E, 22F->R)

Pairs introduced by changing the 4 variable sites (as in Mutated Sequence 2): Pair 1 (R7–E11), Pair 2 (E14–R18), Pair 3 (R22–D26)

 Soluble N-terminal domain                            C-terminal domain
 METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSST  LYVLIFLAIFLS  KFTNQLLLSLLEAVIRTVTTLQQLLT  (Original Sequence)
       R---E LR---E   R---                                                        (Mutated Sites) 
       V   V CV   V   V                                                           (Conserved / Variable)
 METRFPRQSQETLRSTNERRPRKHEDYPCRRQQRSST  LYVLIFLAIFLS  KFTNQLLLSLLEAVIRTVTTLQQLLT  (Mutated Sequence 2)
             L                                                                    (Reverted site)
             C                                                                    (Conserved / Variable)     
 METRFPRQSQETLRSTNERRPRKHEDYPCRRQQRSST  LYVLIFLAIFLS  KFTNQLLLSLLEAVIRTVTTLQQLLT  (Mutated Sequence 4)


alfafold2_multimer_v2 was used to fold the multimers of Mutated Sequence 3 and Mutated Sequence 4. alfafold2_multimer_v2 parameters used:

num_relax: 0
template_mode: none
msa_mode: mmseqs2_uniref_env
Pair mode: paired
num_recycles: 3 
recycle_early_stop_tolerance: auto
relax_max_iterations: 200
pairing_strategy: greedy
max_msa: auto
num_seeds: 1



Mutant 1
pLDDT=37.6, pTM-0.189, ipTM = 0.127. 3 pairs/bridges introduced, 1 conserved site changed (13P->L), RRR site kept, (1 conserved and 4 variable sites changed)

Mutant 3
pLDDT=43.3, pTM-0.188, ipTM = 0.127. Mutant 1 -> the conserved site mutation reverted (13L->P) (4 variable sites of the Original Sequence changed)

Mutant 2
pLDDT=45.8, pTM-0.187, ipTM = 0.126. 3 pairs/bridges introduced, 1 conserved site changed (13P->L), 2nd pair inverted, no RRR site (1 conserved and 5 variable sites changed)

Mutant 4
pLDDT=37, pTM-0.189, ipTM = 0.127. Mutant 2 -> the conserved site mutation reverted (13L->P) (5 variable sites of the Original Sequence changed)


Subsections of Labs

Week 1 Lab: Pipetting

cover image cover image

Subsections of Projects

Individual Final Project

cover image cover image

Group Final Project

cover image cover image