Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
proteins interactions
#1
Hello everyone,

I need to create protein interaction chains (composed of 10 proteins)

To do this, I created a dictionary in this form:

Protein 1 = protein2, protein5, protein6 (protein 1 can interact with protein 2 or 5 or 6).
Protein 2 = protein1, protein7, protein8 (Protein 2 can interact with Protein 1, 7 or 8).
...
Protein 7 = protein34, protein43
...
Protein 43 = protein 74, protein 76

ect ... (I have over 20000 proteins)

I must ultimately have all the possibilities of chains of interactions composed of 10 proteins, for example:

P1 P2 P7 P43 P74 ...
P1 P5 ...
P1 P6 ...
P2 P1 ...
P2 P7 P34 ...

ect...

I have no idea how to code an algorithm for this, can you help me?

thank you
Reply
#2
Can you give simplified code examples of what your input and output should be like? And what part you're struggling with in particular?
Reply
#3
Here is an excerpt from my file where the interactions between two proteins are indicated:

NDUFAF7 NDUFS7 126 0 0 112 119 900 708 976 ---> EXP
NDUFAF7 NDUFS2 295 0 0 93 50 900 913 993 ---> EXP
NDUFAF7 NDUFS3 222 0 216 86 50 900 616 974 ---> EXP
FUCA2 FUCA1 0 0 60 64 187 900 72 921 ---> EXP
HS3ST1 GPC6 0 0 0 0 96 900 439 944
HS3ST1 GPC3 0 0 0 0 96 900 458 946
ARF5 COPE 0 0 0 191 160 900 83 929 ---> EXP
ARF5 DCTN1 0 0 0 95 59 900 69 910 ---> EXP
M6PR LRP2 0 0 0 0 320 900 257 945

Here for example FUCA2 interacts with FUCA1.

Here is the script I have for the moment:

from collections import defaultdict
dico_prot1_prot2 = defaultdict(set)

with open("C:/Users/lveillat/Desktop/Données stage/Données/resultats_matrice_avec_scores_sans_localisation.txt","r") as f1:
	for ligne in f1:
		lp = ligne.rstrip('\n').split(" ") 
		if lp[-1] == "EXP":
			prot1 = lp[0]
			prot2 = lp[1]
			dico_prot1_prot2[prot1].add(prot2)
	

with open("chainespostfiltrage.tsv","w") as f2:
	for prot1 in dico_prot1_prot2: #I walk through each proteins
		tmpchaine = set() # I initialize my protein chain with the first protein
		tmpchaine.add(prot1) 
		for prot2 in dico_prot1_prot2: #I go again my dico
			if prot1 != prot2 and prot1 in dico_prot1_prot2[prot2]: #If the two proteins are different and there is interaction
				if len(tmpchaine) < 10: #If the length of the chain is less than 10 proteins
					tmpchaine.add(prot2) #Then you add the protein that has an interaction in the chain
				elif len(tmpchaine) == 10: #If the chain contains 10 proteins
					chaine = " ".join(tmpchaine) 
					f2.write(chaine+"\n") 
					print(chaine)
					tmpchaine = set() #I empty the chain because it has reached its desired size (10 proteins)
Here is finally the output of my script :

Output:
CAPZA1 KIF22 YKT6 BET1 KIF11 KIF23 KDELR2 COPE COPB1 ARF5 ARCN1 ANK1 ANK3 KIFC2 KIF2B KIF6 ARFGAP3 KIF3C CENPE TMED10 ANK2 KIF9 SPTBN1 ARFGAP1 COPG1 RAB1B KDELR1 SPTBN5 KIF15 COPB2 KIF20B ACTR1A KIF3B GBF1 COPA DYNLL1 CAPZA2 KIF19 KIF12 KIF2C KIF3A RAB1A KDELR3 KIF25 CAPZB KIFC1 KIF26A KIF2A KIF26B COPG2 EP300 HSP90AA1 PPP5C FKBP4 PPID ESR2 STIP1 MAPK3 FOXA1 DCTN1 PSMD10 DVL2 PSMD11 PSMA3 CFTR PSMC1 VAMP7 PSMA6 PSMD7 PSME2
unfortunately, even if I want results in this form, it lacks a lot of interaction channels and I do not know how to fix that.

Thanks for your help
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Object Oriented DB Interactions datasundae 2 2,375 May-25-2018, 09:51 PM
Last Post: datasundae

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020