Project Description I want to develop a synthetic assymetrical-cell-division system to enable synthetic cell differentiation; as a toolkit for engineer multicellular organization, development, pattern formation and others. To futher detail, see the page of the final project
The project have two major subprojects:
A mitotic counter: a circuit capable of counting as states how many cell division the linage have sense since the system has been activated (firts division, second, third, and so on). The system uses the natural fluctuations of the cell’s cycle regulators to induce or activate distinct proteins. When the system is activated, a TF would be activated by an cell-cycle-dependent kinase (at the G1 phase), that would induce the expression of a recombinase, that would inverse the sequence of its own promoter, inducing the expression of an inactivated, second, TF. This TF2 would be activated via a second cell-cycle-dependent kinase, one that would be activated in the final of the cell cycle and not be expressed at the same time with the firts kinase. The phosphorylation upon TF2 would, for example, link two homodymers together, in an activated form (an option, but not necessarily how it would work). When the kinase 2 activates TF2, gene expression would not be available because of chromatin condesation upon mitosis. After the cell divides, at G1, TF2 could induce the activation of other genes, that, in consenquence and, using the architecture, enables the activation of other genes at the second cell division, and so on, making possible to count cell division in the cell lineage. The system also needs other things such as a degradation system, repression for past states, etc. Assymetrical component: ¿How can we engineer an assymetrical distribution of molecules to induce the assymetrical cell division? It should be transferable to other organisms. The proposed one is to used an synthetic phased segregated condensate that is capable of create one individual and stable condensate upon the cell. It should carry an mRNA that would express a TF. It should be formated upon the activation of the scaffolds and dissegregated upon the cell division, enabling the translation of the mRNA selected. The mRNA should be sequestred in the condensate, where its translation-initiation site would be blocked, and it would be protected of degradations by directed nucleases over that mRNA. All of this via linking the sites of union of the scaffolds with this important sites of the mRNA (nuclease recognition site, translation initiation site, rybozome binding site, etc). The scaffolds should be inactivated upon cell division via phosphorylation at the binding sites, liberating the mRNA. But how we can make a single condensate exist? The idea is to use the pyrenoid (a rubisco aggregate condensate that makes CO2 fixation more efficient) as an example for this, where, in a dynamics between phosphatases and kinases, the algae maintains the pyrenoid as a single condensate or multiple when needed. ¿So how all of this is going to work in order to make the assymetrical cell division system function?
Homework Part 1: Benchling and in-silico gel art This is the virtual digestion of the Lambda genome with this restriction enzymes: EcoRI, HindIII, BamHI, KpnI, EcoRV, SacI and SalI.
Homework Part A. Conceptual Questions Answer any NINE of the following questions from Shuguang Zhang: Why do humans eat beef but do not become a cow, eat fish but do not become fish? Because eating beef doesn´t make us incorporate the genetic information that make a cow, and even if we could, we don´t have machinery to process that information and make us a cow.
Homework Part A: SOD1 Binder Peptide Design (From Pranam) Part 1: Generate Binders with PepMLM 1- Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
Assignment: DNA Assembly What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose? The high Phusion High-fidelity polymerase, MgCl2, dNTPs, etc. The polymerase enable low error in the polymerization reaction.
What are some factors that determine primer annealing temperature during PCR? how much Hidrogen bombs are present (type of bases and how much).
I want to develop a synthetic assymetrical-cell-division system to enable synthetic cell differentiation; as a toolkit for engineer multicellular organization, development, pattern formation and others.
To futher detail, see the page of the final project
The project have two major subprojects:
A mitotic counter: a circuit capable of counting as states how many cell division the linage have sense since the system has been activated (firts division, second, third, and so on). The system uses the natural fluctuations of the cell’s cycle regulators to induce or activate distinct proteins. When the system is activated, a TF would be activated by an cell-cycle-dependent kinase (at the G1 phase), that would induce the expression of a recombinase, that would inverse the sequence of its own promoter, inducing the expression of an inactivated, second, TF. This TF2 would be activated via a second cell-cycle-dependent kinase, one that would be activated in the final of the cell cycle and not be expressed at the same time with the firts kinase. The phosphorylation upon TF2 would, for example, link two homodymers together, in an activated form (an option, but not necessarily how it would work). When the kinase 2 activates TF2, gene expression would not be available because of chromatin condesation upon mitosis. After the cell divides, at G1, TF2 could induce the activation of other genes, that, in consenquence and, using the architecture, enables the activation of other genes at the second cell division, and so on, making possible to count cell division in the cell lineage. The system also needs other things such as a degradation system, repression for past states, etc.
Assymetrical component: ¿How can we engineer an assymetrical distribution of molecules to induce the assymetrical cell division? It should be transferable to other organisms. The proposed one is to used an synthetic phased segregated condensate that is capable of create one individual and stable condensate upon the cell. It should carry an mRNA that would express a TF. It should be formated upon the activation of the scaffolds and dissegregated upon the cell division, enabling the translation of the mRNA selected. The mRNA should be sequestred in the condensate, where its translation-initiation site would be blocked, and it would be protected of degradations by directed nucleases over that mRNA. All of this via linking the sites of union of the scaffolds with this important sites of the mRNA (nuclease recognition site, translation initiation site, rybozome binding site, etc). The scaffolds should be inactivated upon cell division via phosphorylation at the binding sites, liberating the mRNA.
But how we can make a single condensate exist? The idea is to use the pyrenoid (a rubisco aggregate condensate that makes CO2 fixation more efficient) as an example for this, where, in a dynamics between phosphatases and kinases, the algae maintains the pyrenoid as a single condensate or multiple when needed.
¿So how all of this is going to work in order to make the assymetrical cell division system function?
When both systems are activated, the mitotic counter and the phase-condensate system, the single condensate is going to be segregated to any of the son’s cell, and, after the division have been made, the TF2 of the mitotic counter could express a kinase for the dissegregation of the condensate, enabling the translation of the mRNA. Also the system could make possible the use of logic gates, whether both components are present or no, x gene is activated (in the other cell that would have only the TF2 expressed, AND, NOT or OR gates based on this could enable the differentiation in this cell also, taking another fate that the father cells haved. Or even more complex differentiation dynamics). Anyways, the now different cell could express an auto-induced transcription regulator, that, as follows, would induce the differentiation of its own self, and his son’s cells (between other feedbacks).
Also the mRNA TF should repress the component of mRNA degradation.
¿And now what?
Those systems are toolkits for the multicellular engineering dicipline. To directly create organisms, we would need to have an autonomous cell differentiation system, a system that by its own is able to activate itself. The mitotic counter, also, for example could work as a signaling event upon certains cell divisions in the designated cell lineage, to activate cell-to-cell communications upon that moment, and only in that moment for example.
Possibilities are endless and only the imagination is the limit.
Governance Context
Autonomous multicellular systems capable of growth, differentiation, and spatial organization introduce additional biosafety and biosecurity considerations compared to single-cell engineering. Risks include unintended environmental persistence, uncontrolled proliferation, ecological interaction, and potential dual-use of self-assembling biological structures. Governance must therefore balance intrinsic biological containment, institutional oversight, and research feasibility.
Options considered
Option 1 — Mandatory Genetic Containment and Fail-Safe Design Intrinsic safeguards such as auxotrophy, kill-switches, replication limits, and environmental sensitivity.
Option 2 — Project Licensing, Design Review, and Traceability Registration, documentation, biosafety review, strain tracking, training, and incident reporting.
Option 3 — Voluntary Standards and Community Best Practices Non-binding guidelines and self-governance without formal enforcement.
Scoring scale: 1 (low contribution) to 5 (strong contribution).
Does the option:
Option 1
Option 2
Option 3
Enhance Biosecurity
• By preventing incidents
5
4
2
• By helping respond
3
5
2
Foster Lab Safety
• By preventing incident
5
4
3
• By helping respond
3
5
3
Protect the environment
• By preventing incidents
5
4
1
• By helping respond
3
4
2
Other considerations
• Minimizing costs and burdens to stakeholders
2
3
5
• Feasibility?
3
4
5
• Not impede research
2
4
5
• Promote constructive applications
4
5
3
Homework pre HW2:
Answers to Professor Jacobson
DNA replication is carried out by DNA polymerases, which exhibit a raw error rate of approximately 10⁻⁵ errors per base incorporated. However, proofreading activity and mismatch-repair pathways dramatically improve fidelity, reducing the effective error rate to roughly 10⁻⁹–10⁻¹⁰ per base.
Given that the human genome contains about 3 × 10⁹ base pairs, an uncorrected replication process would introduce tens of thousands of mutations per cell division. In practice, only a few or fewer mutations accumulate per division. This discrepancy is resolved through polymerase exonuclease proofreading and post-replication DNA repair systems that detect and correct mismatches before they become permanent mutations.
Because the genetic code is degenerate, most amino acids are encoded by multiple synonymous codons (on average roughly three per amino acid). Consequently, a typical human protein of several hundred amino acids could theoretically be encoded by an astronomically large number of different DNA sequences. In practice, however, only a small subset of these sequences functions efficiently. Constraints include codon bias and tRNA availability, unfavorable mRNA secondary structures, extreme GC content, unintended splice or regulatory signals, reduced translation efficiency, and technical limitations in DNA synthesis or cloning. These factors strongly restrict the number of sequences that yield robust protein expression.
Answers to Professor LeProust
The most widely used method for oligonucleotide synthesis is solid-phase phosphoramidite chemical synthesis. In this approach, DNA is assembled one nucleotide at a time on a solid support through automated cycles of nucleotide coupling, washing, and deprotection. This chemistry has remained the industry standard for decades because it is fast, scalable, and reliable for producing short DNA fragments.
Oligos longer than approximately 150–200 nucleotides are difficult to synthesize efficiently because each coupling step has slightly less than perfect efficiency. A small fraction of strands fails to extend at every cycle, and these losses accumulate exponentially as the number of cycles increases. As a result, the final mixture contains many truncated or error-containing products, while the yield of full-length DNA becomes very low. Additional chemical side reactions and base damage further degrade quality.
For this reason, a 2000 base-pair gene cannot be synthesized directly. The cumulative inefficiency across thousands of addition steps makes the probability of obtaining a correct full-length product essentially zero. Instead, long genes are constructed by synthesizing many shorter oligonucleotides and assembling them enzymatically using methods such as PCR assembly or Gibson assembly.
Answer to Professor Church
In most animals, including humans, nine amino acids cannot be synthesized de novo and must be obtained from the diet: histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine. Some sources include arginine as conditionally essential, particularly in growing organisms, yielding a list of ten.
Because lysine is already universally essential, engineering a synthetic “lysine contingency” does not create a fundamentally new biological vulnerability. Instead, it exploits an existing metabolic dependency. Organisms would already require external lysine, so forcing supplementation simply makes this requirement explicit rather than introducing a novel Achilles’ heel. Consequently, such a contingency is predictable and limited as a containment strategy rather than uniquely robust.
Week 2 HW: DNA read, write and edit
Homework
Part 1: Benchling and in-silico gel art
This is the virtual digestion of the Lambda genome with this restriction enzymes:
EcoRI,
HindIII,
BamHI,
KpnI,
EcoRV,
SacI and
SalI.
And here is some artwork:
Part 3: DNA design challenge
3.1- I have choose for the DNA nucleotidyllexotransferase, because it is used in the enzymatic DNA synthesis process, and been able to engineer it could conduct to synthesize DNA in vivo, thing that would make advance the field tremendously.
reverse translation of sp|P04053|TDT_HUMAN DNA nucleotidylexotransferase OS=Homo sapiens OX=9606 GN=DNTT PE=1 SV=3 to a 1527 base sequence of most likely codons.
3.3- Codon optimazed for Saccharomyces cerevisiae:
ATGGATCCACCAAGAGCATCACATTTGTCCCCAAGAAAAAAAAGACCTAGGCAAACTGGTGCATTAATGGCTTCCTCTCCACAGGACATTAAATTCCAAGACTTAGTCGTTTTTATCCTCGAAAAAAAGATGGGTACTACAAGAAGAGCTTTTCTCATGGAATTGGCTAGACGTAAAGGTTTCAGAGTTGAAAATGAACTCAGTGATTCTGTTACACATATTGTTGCTGAAAATAATTCTGGTTCAGATGTTTTGGAATGGTTGCAAGCTCAAAAAGTTCAAGTCTCTTCTCAACCAGAATTGTTAGATGTCAGTTGGTTAATAGAATGCATTAGAGCTGGTAAGCCAGTTGAAATGACTGGCAAACATCAATTAGTAGTAAGAAGAGATTACTCAGATTCAACAAATCCAGGTCCACCAAAAACTCCACCAATAGCTGTTCAAAAAATTTCTCAATACGCTTGTCAACGTAGGACTACTTTAAATAATTGTAATCAAATTTTTACAGATGCTTTTGACATCTTGGCTGAAAACTGTGAATTTAGAGAAAATGAAGATAGCTGCGTTACGTTTATGAGAGCTGCCTCAGTTTTAAAAAGCTTACCTTTTACTATAATTTCTATGAAAGATACCGAAGGAATTCCATGTTTGGGTTCAAAAGTTAAGGGTATTATAGAAGAAATAATTGAAGATGGTGAATCATCAGAAGTTAAAGCAGTGTTGAATGATGAAAGATACCAATCTTTTAAATTATTTACTTCAGTCTTTGGTGTTGGTTTAAAAACCTCCGAAAAATGGTTTAGAATGGGCTTCAGAACCTTGAGTAAGGTGAGATCTGATAAATCATTGAAATTTACGAGAATGCAAAAAGCTGGTTTTTTGTACTACGAGGATTTGGTCAGCTGTGTTACTAGAGCTGAAGCTGAAGCTGTTTCTGTTCTGGTTAAAGAAGCAGTTTGGGCATTTTTGCCTGATGCTTTCGTTACAATGACAGGTGGTTTTAGAAGGGGTAAAAAAATGGGTCATGATGTAGATTTTTTGATTACTTCCCCTGGTTCTACTGAAGATGAAGAACAATTGCTACAAAAAGTGATGAACTTATGGGAGAAAAAAGGTTTGTTGTTGTACTATGATTTAGTTGAGTCTACTTTTGAAAAGTTGAGATTACCATCTCGCAAAGTTGATGCTTTAGATCATTTCCAAAAATGTTTTTTGATCTTTAAGTTGCCAAGACAAAGAGTAGATTCCGATCAGAGTTCCTGGCAAGAAGGTAAGACTTGGAAAGCCATTAGAGTCGACTTGGTTCTGTGTCCATACGAAAGAAGAGCTTTTGCATTATTAGGTTGGACAGGTTCCAGACAATTTGAAAGAGATTTGAGAAGATATGCCACACATGAGAGAAAGATGATTTTAGATAACCATGCATTATATGATAAAACTAAAAGAATCTTCTTGAAAGCCGAATCAGAGGAAGAAATTTTTGCCCATCTAGGTTTGGATTACATTGAACCATGGGAGAGAAATGCT
Avoid cleavage sites of restriction enzymes:
BbsI BsaI
3.4- I can produce it in S. cerevisiae via putting the fragment into a vector with the corresponding parts (RBS, promoter, terminator, etc). I can add tags in the 3´ sequence if i wanted to purify it with a desire method. Also cell-free methods could be use to produce it.
Part 4: prepare a Twist DNA Synthesis Order
Part 5: DNA Read/Write/Edit
Part 5.1: DNA Read
5.1- What DNA would you want to sequence (e.g., read) and why?
For the develop of my personal project, i would need to sequence a lot of the plasmids and designs that i will design in it. Also, genomic DNA sequencing to see the integration of the sequences in the corresponding site might be needed.
(ii) In lecture, a variety of sequencing technologies were mentioned. What technology or technologies would you use to perform sequencing on your DNA and why?
TGS. Nanopore, for example.
To plasmid and genomic site-especific integration view i would amplify the concentration of the sequences by PCR they, and i would fragmentate (e.g via sonification) the sequences. Then, i would design primers to bind to specific regions (e.g adeine tails) and add adaptors to the sequencing machine and barcodes to aling the fragments, and PCR the sequences. What is next is to prepare the material for the sequencing machine (dilute, pipete). And for transcriptome reading the use of a reverse transcriptase in the firts PCR is needed.
For nanopore sequencing, after the preparation of the library, the sample is loaded onto the flow cell, where a flow of ions create a current that passes throught pores. A protein motor guide that pass of the dna throught the pore, where each base disrupt the voltage and the machine is able to interpret it. The raw electrical signals are stored as FAST5 and by algorithms is converted into FASTQ.
5.2 DNA write
(i) What DNA would you want to synthesize (e.g., write) and why?
I would to synthesize the components of the system (e.g transcription factors, kinases, phosphatases) and their respective vectors and cassettes components (promoters, RBSs, terminators). Maybe, as the design process uses tools of protein design, some components may need to be prepared in libraries where certain AA sequences are changign for a more desired output. And, for a more broad characterization, libraries where the components of the cassettes are different.
For now, the components are TF1,TF2, peptidase 1, kinase 1 , cas 13 and their gRNA, condensate andamio, condesate andamio 2, TF3, TF 4 and TF5. All of this components are general because some would be engineered (modified domains, design of motives, etc).
5.3 DNA edit
(i) What DNA would you want to edit and why?
For my system to work i would need to integrate the components in the genome with, maybe, some recombinase-use engineering.
Week 3 HW: Lab automation
Homework
I create the following code by drawing at the opentrons art platform, coping the coordenates and asking chatgpt to generate the code below the code provided by the original opentrons HTGAA google colabs.
Find and describe a published paper that utilizes the Opentrons or an automation tool to achieve novel biological applications.
A massively parallel reporter assay library to screen short synthetic promoters in mammalian cells
The paper uses massive parallel reporter assays (MPRAs) as a high-throughput method to cuantify the barcoded mRNA produced by a series of synthetic promoters, and how the regulatory dynamics change over different compounds. They tested 6144 promoters in conventional and specific mammalian cell lines
The results were obtein by doing NGS and counting the reads of the mRNAs presents in the cells
Write a description about what you intend to do with automation tools for your final project
For my final project i can use cell-free protein synthesis to screen the post-translational modifications with fluorescence based arrays, in a cloud laboratory.
Im going to express the cdks-cyclins complexes, a peptidase, and the correspond TF that would have the post-translational modifications.
If im going to use a peptidase that depends on those cdks/cyclins complexes i would be easy, but it firts needs to exist, and not have other activations, otherwise the system would be active at a different time, and i find this very difficult. Another option could be design it
If i design the peptidase, firts i should test it in the same type of array.
I can build libraries of different changes in the domains and test the perfomance in a well plate.
A possible automated workflow would be:
Echo transfer DNA templates encoding CDK-cyclin complexes, transcription factor variants, protease/peptidase candidates, and fluorescent reporter constructs into specified wells of a 96-well or 384-well plate.
Echo transfer any required cofactors, kinase reagents, ATP supplementation conditions, or small molecules into selected wells depending on the test condition.
Bravo dispense or stamp cell-free protein synthesis master mix into all wells.
Multiflo add lysate or additional common reagents to initiate expression.
PlateLoc seal the plate to prevent evaporation.
Inheco incubate the plate under defined temperature conditions to allow protein synthesis and post-translational regulation to occur.
XPeel remove the seal after incubation.
PHERAstar measure fluorescence or time-course reporter output to compare activation, background signal, and dynamic range across designs.
Answer any NINE of the following questions from Shuguang Zhang:
Why do humans eat beef but do not become a cow, eat fish but do not become fish?
Because eating beef doesn´t make us incorporate the genetic information that make a cow, and even if we could, we don´t have machinery to process that information and make us a cow.
Why are there only 20 natural amino acids?
I think because they were the optimal to being building blocks and have catalytic function. Also maybe have to be with the abundance of those in early life and that maybe suited interactions with RNAs at that time.
Can you make other non-natural amino acids? Design some new amino acids.
My naive try is to mix diketopyrrolopyrrole as the functional group
(Quiting the H close to the N and making there a covalent bound).
Maybe it could work as organic photovoltaics for protein electrical processing
Another could use cyclopentadienyl as the functioanl group to bind rare earth minerals
Or pentamethylcyclopentadiene
Where did amino acids come from before enzymes that make them, and before life started?
From abiotic reactions in special enviroments, with rich carbon, nitrogen and oxygen sources.
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect?
left handedness.
Can you discover additional helices in proteins?
I found that proteins contain alternative helical structures—most notably
-helices,
-helices, and polyproline helices.
Why are most molecular helices right-handed?
I think the repulsion between the CO from the second and third AA, when curving to the left is going to be too strong (they are going to be facing each other). Also, the H-bounds formed in the right handed wouldn´t be so stable in the left handed.
Why do β-sheets tend to aggregate?
For the repetitiveness of the motif that form the initial bounds of the B-sheets that are exposed on the other side of the firts B-sheet.
What is the driving force for β-sheet aggregation?
The hydrophobic effect, because it drives a hiddeness of the functional hydrophobic groups from aqueos solvent.
Why do many amyloid diseases form β-sheets?
I think because, being b-sheets a common motif, if there are problems in the folding process, they could thend to aggregate. And if we think about the a-helix, it could be very difficult to aggregate there, so if b-sheets are termodynamic stable, it would be easy to misfolded protein to aggregate in conjuction if the hydrophobic motif enable them.
Can you use amyloid β-sheets as materials?
If they are well designed, yeah why not.
Design a β-sheet motif that forms a well-ordered structure.
What comes to my mind is a toroid-like estructure.
Part B: Protein Analysis and Visualization
Briefly describe the protein you selected and why you selected it.
I selected TdT, because it polymerases DNA without any template, and i find it cool.
Identify the amino acid sequence of your protein.
MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKG FRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMT GKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCE FRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAV LNDERYQSFKLFTSVFGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRMQKAGFLYYEDLV SCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGGFRRGKKMGHDVDFLITSPGSTEDEEQL LQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALDHFQKCFLIFKLPRQRVDSDQSS WQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKT KRIFLKAESEEEIFAHLGLDYIEPWERNA
It have 509 AA. Lysine is the most frequent amino acid with 50 counts (9.82%). Its homologous to 250 proteins (using uniprot blast tool). Belongs to the DNA polymerase type-X family.
Identify the structure page of your protein in RCSB.
4I27 | pdb_00004i27.
It was released in 2013-07-24. It has a good quality resolution (2.60 Å). In the structure is involved 2’,3’-DIDEOXY-THYMIDINE-5’-TRIPHOSPHATE, magnesium ion and sodium ion. Using the SCOP 2 database, it belong to the DNA nucleotidylexotransferase structural family.
It have more helices thant sheets.
It have a hole where DNA binds to it.
Part C: Using ML-Based Protein Design Tools
Deep Mutational Scans
A series of mutations that stands out is in the T from the 273 AA to D and E (+2 score). Another close one is in the L in the 274 to D and E (+3 score). Another one is 264 E to L (4.76 score).
This mutations are close to the binding pocket of the DNA, affecting maybe the catalysis of the TdT or the affinity for the DNA.
In red L 274, in yellow T 273.
In magenta E 264, in cyan single strand DNA.
Latent Space Analysis
The neighborhoods approximate similar proteins! I could find neighborhoods of polymerases, transcription factors, etc etc. But i couldn´t find my protein.
Folding a protein
Fold your protein with ESMFold. Do the predicted coordinates match your original structure?
I couldn`t see it with py3Dmol, so i open it with NGLviewer
ptm: 0.756 plddt: 89.294
Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?
264 E to L (Non-mutated: cyan, mutated: white.):
ptm: 0.756 plddt: 89.287.
The structure is maintain. This mutation could improve the binding to DNA or other ligands.
Here i tried a desestabilitation mutation predicted by the deep mutational scan. 256 F to P:
ptm: 0.755 plddt: 89.634
It have the form of the non mutated version, but i thing it could work worse than the original protein. The binding pocket changes significantly.
Protein Generation. Inverse-Folding a protein
Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN.
New Sequence:PPPPPKVIRPPRPPPPPPPSPPPPPPPSLQKFKDIVVYVLEDNLGEKKRKELKEKLRAAGFTVSDKLNNDVTHIVAYNLTGTEVLDLIKASGIKLDNTPKLLKISWAEDCIEAGKPVEITPEYILPVEPPPKDKSIPPPPPPPPPPREPLSPYACKRRCSLIDYNKKFVDTFNILAEYYRFLNNSEKADKYNRAAAQLKSLPFEIKSMEDLEGIPHIYPEIRKIIEEILKNGYSTEVEKILNDPYFKTKKLFTSIYGFGLATADKYYKAGYTSIEEVKSDKSIKFSEEQKAGLKYLKDLTRPITREEALRIHEIIKEAVHAFLPDAIVELVGSFARGAETSRDVDFLISSPTWKGDQTLLEKVIEYLKEKGLLLYYKLTPSTYDPNALPSTDVNAPSPFQRVDMIFKLPLEEEEEQLGDRPPGKKWRAVKVDLTLVPYDRFAYARLYFTASPQFRRDLIEYARDERGMLLSSTSLYDLKKKEFISASSVEEIYAALGLPYIPPEELNC
I went and use ESMFold to see how the New Sequence folded. It was interesting.
ptm: 0.761 plddt: 93.088
Its different, but no at the same time, idk..
Then i wanted to try to change only the AA of the binding site. I use pymol to select the AA 4 A around the ligand, and asked chatgpt to write me a code to make this possible.
The code is:
after print(f"Length of chain {chain} is {l}"), put:
positions_to_design = [253, 255, 256, 257, 258, 259, 260, 261, 262,
288, 332, 333, 336, 338, 340, 341, 342, 343,
345, 381, 397, 398, 405, 432, 434, 449, 450,
452, 457, 461]
fixed_positions_dict = {}
name = pdb_dict_list[0]['name']
fixed_positions_dict[name] = {}
for chain in designed_chain_list:
chain_length = len(pdb_dict_list[0][f"seq_chain_{chain}"])
fixed_positions = [i for i in range(1, chain_length + 1) if i not in positions_to_design]
fixed_positions_dict[name][chain] = fixed_positions
print("fixed_positions_dict:")
print(fixed_positions_dict)
New Sequence:MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGAKESRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALSPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLTLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA
This is a library with only the binding site changed. Some will perform better, some worse. If i want to optimaze the structure, i can do the inverse.
Part D. Group Brainstorm on Bacteriophage Engineering
I think the no-dependency for DnaJ can be solved computationaly. Software like proteinMPNN to do inverse folding of the sequence or other sequences, EvolvePro to do in silico mutagenesis to explore variants, fooldseek for search of other natural ocurring sequences with the same structure, ESMFold to corroborate the 3D structure, etc.
An idea that i have is to anchor a synthetic domain into the L protein to help the conformational change. So that one L protein could help another without the need of Dnaj. The interactions may be difficult to stablish, but this could help the acumulation of phages upon oligomeration.
The problem is size of the MS2 genome. This could make it more bigger, more inefficient translation and replication, and less RNA accumulated in the capsid.
Other is use the Loajd mutant but with a low translation efficienty, changing nucleotides involve in the translation of the peptide, to help acumulate enough Loadj upon for the lysis. Even replication rate could be manipulated.
About the firts idea, the pipeline could be:
Know the residues in Dnaj that make the interaction with L protein, and their position in the tridimentional structure. Then, use RFdiffusion or other software to stablish those residues as constrains and make a backbone for that domain (small if possible), or maybe using the entire domain in the Dnaj. Check folding, stability, etc. Anchor that domain with a linker into the L protein. If the interactions between the domain an the N-terminal domain doesn`t occur in the same protein would be good. A good design could help to this. Then test for interactions. How the domain interacts with N-terminous domain vs Dnaj, different possittion to put it, optimazing, etc. Alphafold-Multimer and Boltz-1 to this, proteinMPNN to make the inverse folding, ESMFold to check 3D structure, EvolvePro and ESM-2 for mutational scoring for optimazing and for knowing the residues involve in the interactions use Alphafold-Multimer and Boltz-1 to get the pdb and a visualization system to see the residues.
A pitfall could be not getting enough data of the interactions between the Dnaj and the L protein, or the designed peptides don´t work well because of the limits of the tools.
For the second idea it could be identify the translational motif that interact with the ribosome or other translational structure, find the sequences and see if there are data about more or less efficient translation rate acording to the sequence, and design the new sequence.
Week 5 HW: Protein design part II
Homework
Part A: SOD1 Binder Peptide Design (From Pranam)
Part 1: Generate Binders with PepMLM
1- Begin by retrieving the human SOD1 sequence from UniProt (P00441) and introducing the A4V mutation.
2- Using the PepMLM Colab linked from the HuggingFace PepMLM-650M model card.
3- Generate four peptides of length 12 amino acids conditioned on the mutant SOD1 sequence:
Control: FLYRWLPSRRGG
(X were A in all the peptides)
Part 2: Evaluate Binders with AlphaFold3
1- Navigate to the AlphaFold Server: alphafoldserver.com.
2- For each peptide, submit the mutant SOD1 sequence followed by the peptide sequence as separate chains to model the protein-peptide complex.
3- Record the ipTM score and briefly describe where the peptide appears to bind. Does it localize near the N-terminus where A4V sits? Does it engage the β-barrel region or approach the dimer interface? Does it appear surface-bound or partially buried?
Results:
P0: ipTM = 0.22 pTM = 0.8. It appears to bind to the dimer interface, surface-bound.
P1: ipTM = 0.2 pTM = 0.83. Binds between the N-terminus and the B-barrel, partially buried. (ipTM scores 0.3 in other run that i make accidentally)
P2: ipTM = 0.48 pTM = 0.89. Binds between the B-barrel and the dimer interface, partially buried?
4- In a short paragraph, describe the ipTM values you observe and whether any PepMLM-generated peptide matches or exceeds the known binder.
The ipTM values where generally shorts, but the P2 is the most notorious of the generated peptides. This also exceeds the know binder.
Part 3: Evaluate Properties of Generated Peptides in the PeptiVerse
1- Paste the peptide sequence.
2- Paste the A4V mutant SOD1 sequence in the target field.
3- Check the boxes
(For the P0, as example)
💧 Solubility,Soluble 1.000
🔬 Permeability (Penetrance), Non-permeable 0.043
🩸 Hemolysis, Non-hemolytic 0.067
👯 Non-Fouling, Fouling 0.360
⏱️ Half-Life 1.167 h
🔗 Binding Affinity, Weak binding 5.364
⚖️ Molecular Weight 1431.6
⚡ Net Charge (pH 7) -1.14
🎯 Isoelectric Point 5.55
💦 Hydrophobicity (GRAVY) -0.50
P0: 5.364,pKd/pKi
P1: 6.013,pKd/pKi
P2: 5.406,pKd/pKi
P3: 5.446,pKd/pKi
Control: 5.555,pKd/pKi
Compare these predictions to what you observed structurally with AlphaFold3. In a short paragraph, describe what you see. Do peptides with higher ipTM also show stronger predicted affinity? Are any strong binders predicted to be hemolytic or poorly soluble? Which peptide best balances predicted binding and therapeutic properties?
Well, not necesarily. P2 wasn´t the best, as in comparison with P1. There is no correlation per se between AlphaFold 3 and PeptiVerse. No, there is no one with high hemolysis and poorly soluble. Im going to continue with P1 as has the best binding predicted, the other components seems to be okay, and in AlphaFold 3 the ipTM wasn´t so bad.
Part 4: Generate Optimized Peptides with moPPIt
After generation, briefly describe how these moPPit peptides differ from your PepMLM peptides. How would you evaluate these peptides before advancing them to clinical studies?
ipTM were very low. Some were far away from the 1-10 region selected. 1, 4 and 5 were closest to the N-terminus, all surface bounded.
Part C: Final Project: L-Protein Mutants
Im going for the Option 2: Mutagenesis using Af2-Multimer.
Going to optimize the structure of the N-terminus domain while optimizing worst binding with Dnaj.
Useful data:
L protein sequence: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT
989 50 K L 2.561464
574 29 C R 2.395427
769 39 Y L 2.241778
575 29 C S 2.043150
173 9 S Q 2.014323
573 29 C Q 1.997049
572 29 C P 1.971028
569 29 C L 1.960646
987 50 K I 1.928798
1049 53 N L 1.864930
1209 61 E L 1.818096
1029 52 T L 1.813965
984 50 K F 1.802066
576 29 C T 1.797247
568 29 C K 1.795877
93 5 F Q 1.795244
94 5 F R 1.659716
560 29 C A 1.648655
534 27 Y R 1.628060
434 22 F R 1.602028
92 5 F P 1.596889
997 50 K V 1.594573
995 50 K S 1.574555
96 5 F T 1.559023
95 5 F S 1.556416
889 45 A L 1.539248
775 39 Y S 1.517457
535 27 Y S 1.497052
789 40 V L 1.477630
529 27 Y L 1.474638
435 22 F S 1.423357
563 29 C E 1.383282
760 39 Y A 1.364997
571 29 C N 1.362601
980 50 K A 1.357792
567 29 C I 1.344121
89 5 F L 1.332615
334 17 N R 1.323652
767 39 Y I 1.320101
776 39 Y T 1.302803
514 26 D R 1.268762
566 29 C H 1.246106
764 39 Y F 1.245850
777 39 Y V 1.244389
454 23 K R 1.236555
494 25 E R 1.229349
474 24 H R 1.227778
996 50 K T 1.222128
533 27 Y Q 1.218850
536 27 Y T 1.215567
Amino Acid Position Score
0 L 50 2.561468
1 L 39 2.241780
2 I 50 1.928801
3 L 53 1.864932
4 L 52 1.813968
5 F 50 1.802069
6 V 50 1.594576
7 S 50 1.574557
8 L 45 1.539248
9 S 39 1.517457
10 L 40 1.477630
11 A 39 1.364999
12 A 50 1.357795
13 I 39 1.320103
14 T 39 1.302804
15 F 39 1.245851
16 V 39 1.244390
17 T 50 1.222131
18 L 54 1.120860
19 R 39 1.064191
Position Wild_Type_AA Mutation_AA LLR_Score
989 50 K L 2.561468
574 29 C R 2.395427
769 39 Y L 2.241780
575 29 C S 2.043150
173 9 S Q 2.014325
573 29 C Q 1.997049
572 29 C P 1.971029
569 29 C L 1.960646
987 50 K I 1.928801
1049 53 N L 1.864932
996 50 K T 1.222131
533 27 Y Q 1.218851
536 27 Y T 1.215567
I select this mutations based on the scores predicted by the AIs and the N-terminus region utilize to model: C29R, C29S, S9Q, C29Q, C29P, C29L, C29T, C29K, F5Q, F5R, C29A, Y27R, F22R, F5P, F5T, F5S, Y27S, Y27L, F22S, C29E, C29N, C29I, F5L, N17R, D26R, C29H, K23R, E25R, H24R, Y27Q, Y27T
I make a code with chatgpt to
from the mutations (using only 1-30 AA of the L protein)
→ ESM2 LLR
→ choose top candidates stables
→ ColabFold peptide only
→ ColabFold peptide + chaperon
→ score final by:
mono_pLDDT high
mono_pTM high
complex_ipTM low
contact peptide–chaperon low
final_score high
Genetic Circuits Part I: Assembly Technologies
Assignment: DNA Assembly
What are some components in the Phusion High-Fidelity PCR Master Mix and what is their purpose?
The high Phusion High-fidelity polymerase, MgCl2, dNTPs, etc. The polymerase enable low error in the polymerization reaction.
What are some factors that determine primer annealing temperature during PCR?
how much Hidrogen bombs are present (type of bases and how much).
There are two methods from this class that create linear fragments of DNA: PCR, and restriction enzyme digests. Compare and contrast these two methods, both in terms of protocol as well as when one may be preferable to use over the other.
Restriction enzyme linearization could be more sequence dependent, and also not super escalable. In comparison, PCR linearization is more independent in those aspects, but have the downsides of primer design.
How can you ensure that the DNA sequences that you have digested and PCR-ed will be appropriate for Gibson cloning?
By amplifying with exacts overlaps fragments that flank the desire product.
How does the plasmid DNA enter the E. coli cells during transformation?
By making the cells competetent by (normally) two ways: electroporation and heatshock, inducing pores into the membrane that enables the DNA to entry into the cells.
Describe another assembly method in detail (such as Golden Gate Assembly). Explain the other method in 5 - 7 sentences plus diagrams (either handmade or online). Model this assembly method with Benchling or Asimov Kernel!
Golden Gate Assembly is a method of cloning that uses IIs restriction enzymes. Thesee enzymes cut not in but but close the recognition site, allowing to make specifics overhangs to assembly the parts. There entry vectors and overhangs to the parts, containing the correspond recognition sites and future overhangs.
This method allow to cloning up to 50 parts in one reaction, useful for multigene and from library-parts assembly. There is a common notation to the overhangs based on the type of parts, enabling MoClo (modular cloning) (leves 0: parts of a gene assembly; level 1: multigene assembly; level 2: multi-multigene assembly).
THe reaction usually is the following: (5 minutes 40 C, 5 minutes 16 C)x50, 5 minutes 50 C.
The entry vectors usually have a sequence of Lacz for blue-white colony selection, where the whites have final insert.
For the Asimov (Kernel) i didn´t have the acces to the API to do the homework