Week 4 HW: Protein desing part I
Homework
Part A. Conceptual Questions
Answer any NINE of the following questions from Shuguang Zhang:
Why do humans eat beef but do not become a cow, eat fish but do not become fish? Because eating beef doesn´t make us incorporate the genetic information that make a cow, and even if we could, we don´t have machinery to process that information and make us a cow.
Why are there only 20 natural amino acids? I think because they were the optimal to being building blocks and have catalytic function. Also maybe have to be with the abundance of those in early life and that maybe suited interactions with RNAs at that time.
Can you make other non-natural amino acids? Design some new amino acids.
My naive try is to mix diketopyrrolopyrrole as the functional group

(Quiting the H close to the N and making there a covalent bound). Maybe it could work as organic photovoltaics for protein electrical processing
Another could use cyclopentadienyl as the functioanl group to bind rare earth minerals

Or pentamethylcyclopentadiene

Where did amino acids come from before enzymes that make them, and before life started? From abiotic reactions in special enviroments, with rich carbon, nitrogen and oxygen sources.
If you make an α-helix using D-amino acids, what handedness (right or left) would you expect? left handedness.
Can you discover additional helices in proteins? I found that proteins contain alternative helical structures—most notably -helices, -helices, and polyproline helices.
Why are most molecular helices right-handed? I think the repulsion between the CO from the second and third AA, when curving to the left is going to be too strong (they are going to be facing each other). Also, the H-bounds formed in the right handed wouldn´t be so stable in the left handed.
Why do β-sheets tend to aggregate? For the repetitiveness of the motif that form the initial bounds of the B-sheets that are exposed on the other side of the firts B-sheet. What is the driving force for β-sheet aggregation? The hydrophobic effect, because it drives a hiddeness of the functional hydrophobic groups from aqueos solvent.
Why do many amyloid diseases form β-sheets? I think because, being b-sheets a common motif, if there are problems in the folding process, they could thend to aggregate. And if we think about the a-helix, it could be very difficult to aggregate there, so if b-sheets are termodynamic stable, it would be easy to misfolded protein to aggregate in conjuction if the hydrophobic motif enable them. Can you use amyloid β-sheets as materials? If they are well designed, yeah why not.
Design a β-sheet motif that forms a well-ordered structure. What comes to my mind is a toroid-like estructure.
Part B: Protein Analysis and Visualization
Briefly describe the protein you selected and why you selected it. I selected TdT, because it polymerases DNA without any template, and i find it cool.
Identify the amino acid sequence of your protein. MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKG FRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMT GKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCE FRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAV LNDERYQSFKLFTSVFGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRMQKAGFLYYEDLV SCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGGFRRGKKMGHDVDFLITSPGSTEDEEQL LQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALDHFQKCFLIFKLPRQRVDSDQSS WQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKT KRIFLKAESEEEIFAHLGLDYIEPWERNA
It have 509 AA. Lysine is the most frequent amino acid with 50 counts (9.82%). Its homologous to 250 proteins (using uniprot blast tool). Belongs to the DNA polymerase type-X family.
Identify the structure page of your protein in RCSB. 4I27 | pdb_00004i27. It was released in 2013-07-24. It has a good quality resolution (2.60 Å). In the structure is involved 2’,3’-DIDEOXY-THYMIDINE-5’-TRIPHOSPHATE, magnesium ion and sodium ion. Using the SCOP 2 database, it belong to the DNA nucleotidylexotransferase structural family.
It have more helices thant sheets.
It have a hole where DNA binds to it.
Part C: Using ML-Based Protein Design Tools
Deep Mutational Scans
A series of mutations that stands out is in the T from the 273 AA to D and E (+2 score). Another close one is in the L in the 274 to D and E (+3 score). Another one is 264 E to L (4.76 score). This mutations are close to the binding pocket of the DNA, affecting maybe the catalysis of the TdT or the affinity for the DNA.
In red L 274, in yellow T 273.
In magenta E 264, in cyan single strand DNA.
Latent Space Analysis

The neighborhoods approximate similar proteins! I could find neighborhoods of polymerases, transcription factors, etc etc. But i couldn´t find my protein.
Folding a protein
Fold your protein with ESMFold. Do the predicted coordinates match your original structure? I couldn`t see it with py3Dmol, so i open it with NGLviewer

ptm: 0.756 plddt: 89.294
Try changing the sequence, first try some mutations, then large segments. Is your protein structure resilient to mutations?
264 E to L (Non-mutated: cyan, mutated: white.):

ptm: 0.756 plddt: 89.287. The structure is maintain. This mutation could improve the binding to DNA or other ligands.
Here i tried a desestabilitation mutation predicted by the deep mutational scan. 256 F to P:

ptm: 0.755 plddt: 89.634 It have the form of the non mutated version, but i thing it could work worse than the original protein. The binding pocket changes significantly.
Protein Generation. Inverse-Folding a protein
Let’s now use the backbone of your chosen PDB to propose sequence candidates via ProteinMPNN.
tmp, score=1.6401, fixed_chains=[], designed_chains=[‘A’], model_name=v_48_020 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVFGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRMQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGGFRRGKKMGHDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALDHFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.9088, seq_recovery=0.4665 APPKPVVVRPPPPPPPPPPPPPPPPSPSLRKFKDIHVYILEKNLGKKNREKLKEEARKAGFTVSDELDNDVTHIVAENLTGSEVLELLKESGVKLKNTPKLLKISWLKESIKAGKPVPITPEHILKVEPPLEDPTLPPPPPPPPPPPPPLSPYACKRRTTLENKNQKFVDAFETLAEYYEYNNKEEEARRYRRAAAQLRSLPFRIRSMEDLEGIPHIGPEIRAIIEEILKNGYSSKVEAILNDPYFQTMKLFTSVYGFGLKRAKKFYKRGYTSIEEVKNDKSIKFTDEQKAGLKYLEDLRRPITKEEALAIHEIIKKAVHKFLPDAEVVLVGSFRRGHKTSKDVDFLISSPTWKGDQTLLGKVIEHLKAEGKLLYYKLTPSTYDPNQLPSEDINAPPKFARVDMILKLPLSSEKPAEKGRKPGVDYKAVKVDLTLVPYERKAYALLAFTASPQFLEDLRRYAKEEKGMLLDDSSLYDLKKGEFISAKTVKEIFDALGLEYIPPEEMNA T=0.2, sample=0, score=0.9001, seq_recovery=0.4370 PPPPPVVPIPPPPPPPPPPPPPPPPPPSLQRFADIVVYVLPENLGEAEKAALEAALRAAGFTVESELSNKTTHIVAKNLTGKEVLELLEKSGFKLENKPKLLKISWAEECLAAGKPVPITPEHILPIEPPPAVPLLPPPPPPPPPPRPPLSPYACKRKCSLKDYNQKFVDAFEVLARYYEFKGDREKAERYRRAAAILKSLPFKIEKLSDLEGIPHIGPEIRALIEEILKNGYSSEVQAVLADPYFQTYKLFLSVYGFGLERAKEFYERGYTSIEEVKEDPSIEFTPKQKAGLKYLEDLRRPVTREEAEQIHEIIKEAVHAVLPDAKVEIVGRFRRGHETARDVDFLISSPSLTGDQSVLGRVIERLKEQGLLLYAEYTPSTYQPDQLPSEDIDAPPPYARVDMIFKLPLKPEAPAEAGREPGKTYKAVKVDLYLVPYENYAYALLYFTGSRQFIRDLKRYAKEERGMLLDYTRLYDKKKKEFIEAKTEQEIYDALGLEYIPPELRNA T=0.2, sample=0, score=0.9000, seq_recovery=0.4567 PPPPPVVVVPPPPPAPPPPPPPPPPPPALQKFKDITVYILEKNLGEKEKEKLKKKARAAGFNVSDKLNNDVTHIVAKNLTGQEVLDLIKASKVKLTKTPTLLKISWLEECIKAGRPVPITPEHLLPVEPPLAAPDLPPPPPPPPPPPPPVSPYACKRRCSLEDKNQKFVDTFEILAKYYEFKNKEEEAERYRRAAAILRSLPFEIRTMEDLEGLPHIDEEIRAIIKEILENGYSSLVEKILNDPYFQTMKLFTSIYGVGLKTAKKFYKKGYTTVEQVRNDPTIKFTPKQKAGLKYLEDLRRPITREEAEKIHEIIKEAVHKHAPDAIVEVVGRFRRGHKESRDVDFLISSPTWTGDQSLLGRVLEDLKKRGLLLYAELTPSTYQPNQLPSTDPNAPIPFERVDMIFKLPKSGEEPEEEGEPPGRDWKAVKVDLVLVPYERRAYALLYFTGSPQFVRDLMEYARNERGMLLDAYRLYDTKTNQFLTAASEEEIYAALGLEYIPPEQRNC T=0.2, sample=0, score=0.9176, seq_recovery=0.4724 APKPPKIPIPPPPPPPPPPPPPPPPPPSLQRFADIVVYILEKNLGKERQEKLAEAARAAGFRVSDELNNDVTHIVAYNLTGSEVLKWLKESKVKLTNTPKLLKISWLEACLKAGKPVPITPEYILEIEPPEADPLVPPPPPPPPPPRPPLSPYACLRECGLEDKNKEFVDTFDILAKYYEFKNDKEKARKYRRAAAILRSLPFKIESMEDLEGIPHIGEEIRKIIEEILENGYSSEVQKILNDEYYQTYKLFTSIYGIGLKRAKKFYKKGYKTIEQVKADKSIKFSKKQKAGLKYYKDLVSPVTREEAEAIHKIIKEAVHKFLPDAEVELVGSFRRGHETSKDVDFLISSPTWTGDQSLLGKVLELLKEKGLLLYAEFTPSTYQPDQLPSTDPDAPLPFARVDMIFKLPLKGEEPSDKGREPGVKYKAVKVDLTLVPYERRAYALLWLTASPQFRRDLMEYAAKERGKLLSATALYDTKTNTFLEAETEQEIYDHLGLPYIPPELRNC T=0.2, sample=0, score=0.9219, seq_recovery=0.4665 APLPPVVPIPPPPPPPPPPPPPPPPPPSLQKFKDITVYILEKNLGSEKKKELQEKARAAGFNVSEELNDDVTHIVAENLSGTEVLELLEKSNFKLSNKPLLLKISWLEACIEAGEPVPITPEHILPVEEPPLLPLLPPPPPPPPPPPPPVSPYCCERRCSLEDKNQKFVDAFDILAEYARFRDDEATARRYRRAAAQLKSLPFRIESLEDLEGIPHIGPEIRAIIKEILENGYSSEVKAILNDPYFQTMRLFLSIYGIGLKRAKKFYRMGYTTVEQLKADKSIKFTEKQKAGLKYLKDLTRPVSREEALAIHEIIRAAVHAVLPDAEVELVGSFARGHETSRDVDFLISSPTWTEDQTILGKVIAALEADGLLLYHEYTPSTYKENDLPSTDVNAPIPFAKVDMILKLPRAPALPAEKGEEPGQDWRAVRVDLYLAPYDRRAYARLYLTGSPQFVADLREYAQEKRGMLLDYSRLYDLKEKKFIPAESEQEIYDHLGLEYIPPEERNC T=0.2, sample=0, score=0.9058, seq_recovery=0.4587 GPPRPVVIRPPPPPPPPPPPPPPPVPPSLNKFRNITVYILEKNLGKKRRKELEEKARKAGFNVSEELNNNVTHIVAENLTGTEVEELLKKSKFKLKNKPKLLKISWLLESIKAGKPVPITPEHILKIEPPKEDPSIPPPPPEPPPPRPPLSPYACKRRTTLEDKNKKFVDTFNVLAEYYEFRKDKEKARKYRAAAAQLKSLPFEIKSIEDLEGIPHIGPEIRKIIKEILENGYSTEVEKILNDPYFQTYKLFTSIYGIGLSRADKFYKLGYTTIEEVKNDKSIKFTPKQKAGLKYLEDLQRPITKKEAEQIHKIIKEAVHKFLPDAEVELVGRFARGHETAKDVDFLISSPTLKGDQSFLQKVIDYLKEKGLLLYYEYTPSTYDPNRLPSRDIDAPSPFARVDMILKLPLEPEEPDEKGRKPGVNYRAVKVDLVLAPYERKAYAKLYFTASPQVREDLMRYAEEERGYLLSDDSLYDLKKKEFLKANSEEEIYKHLGLPYIPPTERNG T=0.2, sample=0, score=0.9172, seq_recovery=0.4114 KPKRKKKLKKRKPKPPPPPPPPPPPPPALQKFADITVYVLEKNMGKAAVEALKAKARAAGFTVSDELNNDVTHIVAKDMTGTEVKELLKASGKELKNEPLLLKISWLERSIKAGKPVEITPEDILPIEEPLEDPSLPPPPPPPPPPRPPLSPYACQRACGLRDLNAPFVAAFETLAEYYEFKDDRDRALRYRRAAAVLRSLPFRVRTLADLEGLPHIGPEIRALIAEILANGYSSEVEAVKADEYYITYKTFLSIYGIGLKRAKYFYAKGYTSIEELKADKSIKFTEKQKAGLKYLKDLRRPISKEEAEAIAEIIKRAVHEFLPDAKVEVVGRFARGHEESKDVDFLISSPTWKDDQTLLGKVIDKLKAEGLILYYEYTPSTYNPNDLPSTDPDAPPKFARVDMILKLPLTPEAPEEKDRPPGRDYRAVKVDLTLCPYERYAYALLYFTASPQFRRDLMLYAKEERGMLLDYSRLYDTKTKKFIKAKTVQEIFDALGLPYIPPELQNA T=0.2, sample=0, score=0.9068, seq_recovery=0.4567 SLPPVVVVRPPPPPPPPPPPPPPPPPPSLRRFANINVYILPENLGEREREALAARARAAGFNVSDELNNDVTHIVARNLTGTEVLEILEKSGIELKNTPLLLKISWLEESIRAGRPVPITPEHILPIEEPLADPTLPPPPPPPPPPRKPLSPYACQRETSLVNKNQAFVDTFDILAEAAEFRNDRETARRYRRAAAQLRSLPFPIRSEADLEGIPHIGPEIRAIIKEILENGYSSRVAAILADPYFRTKRLFTSVYGFGLATADRLYKRGYRTIEEVIADKSIKFTEEQKAGLKYLEDLRAPITREEAEAIHKIIKEAVHKFLPDAKVEMVGSFARGHKESKDVDFLISSPTLKGDQSVLQKVIDYLKEEGLLLYSKYTPSTYKPDDLPSRDPDAPLKFARVDMILKLPLTPAAPAEAGRKPGVDYKAVKVDLTLVPYSRYAYALLWLTASPQVREDLRLYAEEERGMLLDASALYDKKTKKFLPANSVEEIYAHLGLPYIPPELLNG
tmp, score=1.6412, fixed_chains=[], designed_chains=[‘A’], model_name=v_48_020 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVFGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRMQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGGFRRGKKMGHDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALDHFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.9274, seq_recovery=0.4429 PPPPPKVIRPPRPPPPPPPSPPPPPPPSLQKFKDIVVYVLEDNLGEKKRKELKEKLRAAGFTVSDKLNNDVTHIVAYNLTGTEVLDLIKASGIKLDNTPKLLKISWAEDCIEAGKPVEITPEYILPVEPPPKDKSIPPPPPPPPPPREPLSPYACKRRCSLIDYNKKFVDTFNILAEYYRFLNNSEKADKYNRAAAQLKSLPFEIKSMEDLEGIPHIYPEIRKIIEEILKNGYSTEVEKILNDPYFKTKKLFTSIYGFGLATADKYYKAGYTSIEEVKSDKSIKFSEEQKAGLKYLKDLTRPITREEALRIHEIIKEAVHAFLPDAIVELVGSFARGAETSRDVDFLISSPTWKGDQTLLEKVIEYLKEKGLLLYYKLTPSTYDPNALPSTDVNAPSPFQRVDMIFKLPLEEEEEQLGDRPPGKKWRAVKVDLTLVPYDRFAYARLYFTASPQFRRDLIEYARDERGMLLSSTSLYDLKKKEFISASSVEEIYAALGLPYIPPEELNC T=0.2, sample=0, score=0.9223, seq_recovery=0.4724 GPAPPVVVRPPPPPPPPPPPPPPPPPPSLQKFKDITVYVIEENLGKENREELEEKLRKAGFNVSERLNNDVTHIVGKNLTGSEVEELLKKSKIKLKNKPLLLKISWAEECIKAGKPVPITPEYILKVEPKLSLPLLPPPPPPPPPPPPPLSPYCCQRRCTLENKNQKFVDAFEILAKYAEFKDDRETAEKYRRAAAQLRSLPFAIKSVKDLEGIPHIDEEIRAIIEEILKNGYSSKVRAIKNDPYYQTMKLFTSIYGFGIGTAKKLYRAGYTSIEEVRADKSIKFTEEQKAGLKYYEDLVAPITREEAEQIHEVIKKAVHKFLPDAIVEMVGSFRRGAKESKDVDFLISSPTWKGDETVLQKVIDYLKKEGLVLYYKYTPSTYDPNQLPSEDIDAPPPFAKVDMIFKLPLKGESPAEKNRKPGRDWKAVKVDLVLTPYERYAYALLYFTASRQTRRDLIEYAKNERGMLLDYSSLYDKKENKFLEAETEEEIYAHLGLPYIPPEERNC T=0.2, sample=0, score=0.8952, seq_recovery=0.4567 PPPPKVVIKPPKPPPPPPPPPPPPPPPGLRKFADITVYIIEKNLGKKKREKLKAAARAAGFNVADELSNDVTHIVGYNLTGTEVLELLKKSGINLKNKPLLLKISWLEESLKAGRPVPITPEHILPVEPKPELPLLPPPPPPPPPPRPPLSPYCCKRRCSLEDKNQKFVDAFNVLAEYYEFNNDEEKAEKYRKAAAQLRSLPFRIESMEDLEGIPNIGEEVREIIKEILENGYSSKVEEILNDPYYQTKKLFLSVYGFGKKTADKLYKAGYTTIEQVKNDKSIKFTPEQKAGLKYLKDLTAPVTREEAEAIAEIIREAVHAVLPDAEVEVVGSFRRGAETSKDVDFLISSPTWAPDQSVLSRVIDRLKQQGLLLYAKLTPSTYDPDALPSEDVNAPLPFARVDMILKLPLPPAAPERRGRPPGKDWKAVKVDLTLCPWERRAYALLWLTGSPQFVRDLREYAREERGMLLSASSLYDRKAGRFLPARSEEEIYAALGLPYIPPEQRNC T=0.2, sample=0, score=0.9096, seq_recovery=0.4508 PAPPPVVIVPPKPPPPPPPPPPPPPPPSLQRFADITVYILPENLGEARRAELKARARAAGFTVSDELNDDVTHIVAYNRTGTEVLDILKKSGFKLSNKPLLLKISWLEASLEAGRPVPVTPEHILPVEPPPQEPLVPPPPPPPPPPREPLSPYACQRETSLTDKNKKFVDTFKVLAEYARFKNDEELALRYERAAAQLKSLPFEIRSMEDLEGIPHIDEEIRAIIEEILKNGYSSKVQAILNDPYFQTHKLFTSIYGIGLKLADKFYKKGYTSIEQVIADKSIKFTPEQKAGLKYLEDLQAPVTREEAEQIHKIIKEAVHKHLPDAEVVLVGSFARGAETSDDVDFLISSPTWKDDQSLLLKVIEDLKEKGLLLYYKYTPSTYDPDALPSTDDNAPIPFARVDMILKLPLTGAEPASAGRKPGKDWKAVKVDLTLVPYERKAYALLWLTGSPQFRRDLRRYALEERGMLLDYSALYNLETQEFIEAKSVEEIFEALGLPYIPPELLNA T=0.2, sample=0, score=0.9305, seq_recovery=0.4469 APKPKKVIKKPKPPPPPPPLPPPPPPPSLQKFKDITVYVLPENLGEEEVKKLKKELREAGFNVSDELNNDVTHIVAKDLTGTEVLELLKKSNVKLTNTPLLLKISWALESIKAGKPVPITPEHILPIEEPLEDPTVPPPPPPPPPPPPPLSPYACKRKTTLKDYNQKFVDAFNILAEYYEMLGKKEEARKYRRAAAQLKSLPFRIKSLEDLEGIPHIGPEVRAIIKEILENGYSSKVKAILNDPYYQTMKLFLSVYGIGLSRAKRLYKKGYTTIEQAKNDKSIKFSKKQKAGFKYYEDLTAPITKEEALKIHKIIKEAVHKFLPDAEVELVGSFARGHDTSRDVDFLISSPTWKDDQTVLQKVIDLLQKEGLLLYYELTPSTYQPNQLPSTDIDAPPTFQRVDMIFKLPLDESKPDEKGRKPGRTYKAVLVDLVFVPYERKAYALLWLTMSRQFRADLIEYAKEERNYLLSYDSLYNLKEQKFIEAKSVQEIFDNLGLEYIPPTQRNG T=0.2, sample=0, score=0.9164, seq_recovery=0.4528 PPPPVVVVRPPPPPPPPPPPPPPPPDPSLQKFKDITVYILEENLGEEERKELEEKARAAGFNVSKELNNDVTHIVAKNLTGTEVLELLESSNVVLENEPLLLKISWLEECIEAGKEVPITPEYILPVEEPLEVPLLPPPPPPPPPPRPPLSPYACKRRTSLKDLNQKFVDAFDILSQYYRMKNDSENARRYRRAAAQLKSLPFKITSMEDLEGIPHIGPEVRALIQEILENGYSSKVREILNDPYYQTMKLFLSVYGFGLSRANRLYKKGYTSIEQVKADPSIKFTPKQKAGLKYLEDLTRPVTKEEAEKIGEIIKEAVHKVLPDAKVTIVGSFARGAKTSKDVDFLISSPTLKNDQTVLQRVIDILKADGLLLYAELTPSTYDPNQLPSLDPNAPIPFEKVDMILKLPLSGEEEKEKGRKPGKNYRAVKVDLVLTPYERYAYALLYFTGSPQFVEDLRLYAREEKGMLLDYSNLYDTKKGEFLEAKTEKEIYDHLGLPYIPPEERNA T=0.2, sample=0, score=0.9163, seq_recovery=0.4705 SKPKKRIVKKPKPPAPPPPPPPPPPPPRLQRFADITVYILEKNLGKEAQDELKARARAAGFNVSDELNNDVTHIVAENLTGTEVLNLIKASGVELKNTPKLLKISWLEECIKAGRPVPITPEHILPVKPPLAVPGLPPPPPPPPPPRPPLSPYACKRKCSLENKNQKFVDAFNILADYYSFKNNTEKEIAYRRAAAQLKSLPFKITSMEDLEGIPHIGPEIRAIIEEILKNGYSSEVEKIKNDPYYQTMKLFTSVFGFGLSRAKKLYKKGYKTIEEVKNDPSIKFTPKQRAGLKYLEDLTSPVTREEAEAIGKIIKEAVHSVLPDAKVEIVGSFARGHKTSRDVDFLISSPTLTGDQSVLEKVLEKLKADGLLLYYELTPSTYDPDALPSEDPDAPPPFQRVEMILKLPLEGKEPEEAGRKPGRDWKAVRVDLVLAPYERYAYALLYFTNSPQTRRDLMEYARTERGKLLDYSRLYDKEKGEFIKAETEEEIYAALGLEYIPPELRNA T=0.2, sample=0, score=0.9087, seq_recovery=0.4469 APPPPVVPIPPPPPPPPPPPPPPPPPPALRKFSDITVYILEKNLGAAARDALTARARAAGFTVSDELNNDVTHIVARNLTGSEVLSWLKASGFKLKNKPLLLRIDWLEACLKAGRPVPITPEHILPVEPPPLDPSVPPPPPPPPPPPPPLSPYACKRRTSLKDLNQAFVDAFNVLAEYARMQKDTARADAYDRAAAQLKSLPFRVRTLEDLEGIPHIGADIRAIIAEILANGASSEVAAVLADPYFQTMKTFLSIYGIGLNTAEKFYKKGYTSIEQVRADKSIKFTPEQKAGLEYLEDLTRPVTRAEAEAIHRIIRAAVHAVLPDAQVELVGSFARGHETSRSVDFLISSPTWKGDQSLLGRVLERLRAEGLLLYAKLTPSTYKPDQLPSTDPNAPPPFQRVDMILKLPLPGEAPEAAGRPPGRDWRAVKVDLVLAPYERRAYARLWFTGSPQYRRDLMDYALKEKNMLLSYDSLYDLEKKEFLKAQTEQEIFDHLGLEYIPPEDRNA
New Sequence:PPPPPKVIRPPRPPPPPPPSPPPPPPPSLQKFKDIVVYVLEDNLGEKKRKELKEKLRAAGFTVSDKLNNDVTHIVAYNLTGTEVLDLIKASGIKLDNTPKLLKISWAEDCIEAGKPVEITPEYILPVEPPPKDKSIPPPPPPPPPPREPLSPYACKRRCSLIDYNKKFVDTFNILAEYYRFLNNSEKADKYNRAAAQLKSLPFEIKSMEDLEGIPHIYPEIRKIIEEILKNGYSTEVEKILNDPYFKTKKLFTSIYGFGLATADKYYKAGYTSIEEVKSDKSIKFSEEQKAGLKYLKDLTRPITREEALRIHEIIKEAVHAFLPDAIVELVGSFARGAETSRDVDFLISSPTWKGDQTLLEKVIEYLKEKGLLLYYKLTPSTYDPNALPSTDVNAPSPFQRVDMIFKLPLEEEEEQLGDRPPGKKWRAVKVDLTLVPYDRFAYARLYFTASPQFRRDLIEYARDERGMLLSSTSLYDLKKKEFISASSVEEIYAALGLPYIPPEELNC
I went and use ESMFold to see how the New Sequence folded. It was interesting.
ptm: 0.761 plddt: 93.088

Its different, but no at the same time, idk..
Then i wanted to try to change only the AA of the binding site. I use pymol to select the AA 4 A around the ligand, and asked chatgpt to write me a code to make this possible.
The code is: after print(f"Length of chain {chain} is {l}"), put:
positions_to_design = [253, 255, 256, 257, 258, 259, 260, 261, 262,
288, 332, 333, 336, 338, 340, 341, 342, 343,
345, 381, 397, 398, 405, 432, 434, 449, 450,
452, 457, 461]
fixed_positions_dict = {}
name = pdb_dict_list[0]['name']
fixed_positions_dict[name] = {}
for chain in designed_chain_list:
chain_length = len(pdb_dict_list[0][f"seq_chain_{chain}"])
fixed_positions = [i for i in range(1, chain_length + 1) if i not in positions_to_design]
fixed_positions_dict[name][chain] = fixed_positions
print("fixed_positions_dict:")
print(fixed_positions_dict)
And run the code.
tmp, score=1.1791, fixed_chains=[], designed_chains=[‘A’], model_name=v_48_020 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVFGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRMQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGGFRRGKKMGHDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALDHFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.8505, seq_recovery=0.6000 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGIGLKTSEKWFRMGFRTLSKVRSDKSLKFTRRQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKTSRKVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALPEFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVELVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7625, seq_recovery=0.7000 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKESRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALSPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.8259, seq_recovery=0.6667 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRSQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKTSREVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALPDFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7350, seq_recovery=0.7000 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRSQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKTSRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALSPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7073, seq_recovery=0.6667 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGIGLKTSEKWFRMGFRTLSKVRSDKSLKFTRRQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKTSRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALSPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7011, seq_recovery=0.7000 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKTSRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALSPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7647, seq_recovery=0.6667 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFRSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKTSRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALPPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.6985, seq_recovery=0.7000 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGAKTSRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALSPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA
tmp, score=1.1870, fixed_chains=[], designed_chains=[‘A’], model_name=v_48_020 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVFGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRMQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGGFRRGKKMGHDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALDHFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7722, seq_recovery=0.6667 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGAKESRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALSPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLTLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7871, seq_recovery=0.7000 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVFGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRKQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKTSRSVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALSPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7977, seq_recovery=0.7000 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRRQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGAKESRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALPDFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.6841, seq_recovery=0.7333 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRRQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKTSRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALDPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.6955, seq_recovery=0.6333 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGIGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKTSRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALSPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLTLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7886, seq_recovery=0.6000 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGIGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGRFRRGHKTSRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALPDFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVVLTLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7773, seq_recovery=0.7000 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGHKTARDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALPPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA T=0.2, sample=0, score=0.7904, seq_recovery=0.6667 MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGAKTSRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALPPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVVLVLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA
New Sequence:MDPPRASHLSPRKKRPRQTGALMASSPQDIKFQDLVVFILEKKMGTTRRAFLMELARRKGFRVENELSDSVTHIVAENNSGSDVLEWLQAQKVQVSSQPELLDVSWLIECIRAGKPVEMTGKHQLVVRRDYSDSTNPGPPKTPPIAVQKISQYACQRRTTLNNCNQIFTDAFDILAENCEFRENEDSCVTFMRAASVLKSLPFTIISMKDTEGIPCLGSKVKGIIEEIIEDGESSEVKAVLNDERYQSFKLFTSVYGVGLKTSEKWFRMGFRTLSKVRSDKSLKFTRTQKAGFLYYEDLVSCVTRAEAEAVSVLVKEAVWAFLPDAFVTMTGSFRRGAKESRDVDFLITSPGSTEDEEQLQKVMNLWEKKGLLLYYDLVESTFEKLRLPSRKVDALSPFQKCFLIFKLPRQRVDSDQSSWQEGKTWKAIRVDLTLCPYERRAFALLGWTGSRQFERDLRRYATHERKMILDNHALYDKTKRIFLKAESEEEIFAHLGLDYIEPWERNA
This is a library with only the binding site changed. Some will perform better, some worse. If i want to optimaze the structure, i can do the inverse.
Part D. Group Brainstorm on Bacteriophage Engineering
I think the no-dependency for DnaJ can be solved computationaly. Software like proteinMPNN to do inverse folding of the sequence or other sequences, EvolvePro to do in silico mutagenesis to explore variants, fooldseek for search of other natural ocurring sequences with the same structure, ESMFold to corroborate the 3D structure, etc.
An idea that i have is to anchor a synthetic domain into the L protein to help the conformational change. So that one L protein could help another without the need of Dnaj. The interactions may be difficult to stablish, but this could help the acumulation of phages upon oligomeration. The problem is size of the MS2 genome. This could make it more bigger, more inefficient translation and replication, and less RNA accumulated in the capsid.
Other is use the Loajd mutant but with a low translation efficienty, changing nucleotides involve in the translation of the peptide, to help acumulate enough Loadj upon for the lysis. Even replication rate could be manipulated.
About the firts idea, the pipeline could be: Know the residues in Dnaj that make the interaction with L protein, and their position in the tridimentional structure. Then, use RFdiffusion or other software to stablish those residues as constrains and make a backbone for that domain (small if possible), or maybe using the entire domain in the Dnaj. Check folding, stability, etc. Anchor that domain with a linker into the L protein. If the interactions between the domain an the N-terminal domain doesn`t occur in the same protein would be good. A good design could help to this. Then test for interactions. How the domain interacts with N-terminous domain vs Dnaj, different possittion to put it, optimazing, etc. Alphafold-Multimer and Boltz-1 to this, proteinMPNN to make the inverse folding, ESMFold to check 3D structure, EvolvePro and ESM-2 for mutational scoring for optimazing and for knowing the residues involve in the interactions use Alphafold-Multimer and Boltz-1 to get the pdb and a visualization system to see the residues.
A pitfall could be not getting enough data of the interactions between the Dnaj and the L protein, or the designed peptides don´t work well because of the limits of the tools.
For the second idea it could be identify the translational motif that interact with the ribosome or other translational structure, find the sequences and see if there are data about more or less efficient translation rate acording to the sequence, and design the new sequence.