Jul-11-2019, 02:11 PM
Hello everyone,
I create an algorithm that will give me the path of the shortest interactions between 2 proteins, I explain:
I have a first file that contains the pairs of proteins to test (ex):
prot1 prot2 (the string will start with prot1 and end with prot2)
prot3 prot4
ect ..
My second file is a binary interactions file between different proteins, eg:
prot15 prot20
prot3 prot1
prot100 prot21
prot1 prot16
prot16 prot2
ect ...
for example here the shortest path between prot1 and prot2 is: prot1 prot16 prot2
Here is my code:
Thank you :)
I create an algorithm that will give me the path of the shortest interactions between 2 proteins, I explain:
I have a first file that contains the pairs of proteins to test (ex):
prot1 prot2 (the string will start with prot1 and end with prot2)
prot3 prot4
ect ..
My second file is a binary interactions file between different proteins, eg:
prot15 prot20
prot3 prot1
prot100 prot21
prot1 prot16
prot16 prot2
ect ...
for example here the shortest path between prot1 and prot2 is: prot1 prot16 prot2
Here is my code:
import sys sys.setrecursionlimit(1000000) from collections import defaultdict dd = defaultdict(set) P=[] L=[] #File containing the test proteins, I put all the couples in a list with open("C:/Users/lveillat/Desktop/Chaines minimales/3080_interactions_pour_736.txt","r") as f0: for lignes0 in f0: P.append(lignes0.rstrip("\n").split(" ")[:2]) #File containing all the interactions, I make a dictionary with a protein key and value all the proteins with which it interacts with open("736(int_connues_avec_scores_sup_0).txt","r") as f1: for lignes in f1: if lignes.startswith('9606'): lignes=lignes.rstrip('\n').split(" ") prot1=lignes[2] prot2=lignes[3] dd[prot1].add(prot2) #print(dd) #Function allowing me to build interchange channels def chain(proteine1, proteine2, maillon, pathway, limite=10): next_= maillon.get(pathway[-1],None) if len(pathway) < limite : for m in next_: if m not in pathway: if m != proteine2: yield from chain(proteine1, proteine2, maillon, pathway + [m]) elif pathway[0]==proteine1:: pathway.append(m) yield pathway #For my protein couples to study for c in P: #print(c) proteine1=c[0] proteine2=c[1] L.clear() print("The first protein in the chain is", proteine1) print("The last protein in the chain is", proteine2) print("") print("Minimal size chain(s):") print("") #I put in a list all possible chains of interactions starting with protein1 and ending with protein2 for k in dd: for z in chain(proteine1,proteine2, dd, pathway = [k]): if z not in L: L.append(z) # I display the smallest elements of the list = the shortest chains min_len=min(len(z) for z in L) mins=[z for z in L if len(z) == min_len] for chaine in mins: if chaine[-1]==proteine2: print(' '.join(chaine)) else: print('ERROR - NO CHAINS BELOW THE LIMIT', ) print("") print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx") print("")The problem is that it is extremely slow ... are there lines that you think can be optimized and save me computing time?
Thank you :)