Jun-12-2019, 02:34 PM
Hello,
I need to create an algorithm that can generate all possible protein interaction chains of size x (x proteins in the chain).
I have a test file that represents binary interactions (1 interacts with 4 and 6. 4 with 9, ...):
1 4
1 6
4 9
6 9
9 33
9 66
4 1
6 1
9 4
9 6
33 9
66 9
Here is my python script:
defaultdict(<class 'set'>, {'1': {'6', '4'}, '4': {'1', '9'}, '6': {'1', '9'}, '9': {'6', '33', '66', '4'}, '33': {'9'}, '66': {'9'}})
1 6 9 33
1 6 9 66
1 6 9 4
1 4 9 6
1 4 9 33
1 4 9 66
4 1 6 9
4 9 6 1
6 1 4 9
6 9 4 1
9 6 1 4
9 4 1 6
33 9 6 1
33 9 4 1
66 9 6 1
66 9 4 1
As you can see, the algorithm gives me the chains of interactions of size 4.
However, all chains of intermediate interactions (which are not of size 4 because they are blocked before) are missing, for example:
66 9 33
33 9 66
9 33
...
It must not be much to change, but it's been several days that I'm on it without success ...
If time permits, I will be very grateful for any help ...
If you have questions or need clarification, do not hesitate
Thank you
I need to create an algorithm that can generate all possible protein interaction chains of size x (x proteins in the chain).
I have a test file that represents binary interactions (1 interacts with 4 and 6. 4 with 9, ...):
1 4
1 6
4 9
6 9
9 33
9 66
4 1
6 1
9 4
9 6
33 9
66 9
Here is my python script:
import sys sys.setrecursionlimit(1000000) from collections import defaultdict dd = defaultdict(set) with open ("C:/Users/lveillat/Desktop/Données stage/Fichiers tests/testchaines3.txt","r") as f1: for ligne in f1: lp = ligne.rstrip('\n').split(" ") prot1 = lp[0] #I select the first protein of each interactions prot2 = lp[1] #I select the second protein of each interactions dd[prot1].add(prot2) #I create my dictionary with key the first prot of the interaction and in values the sets of prots with which it can interact print(dd) def chain(maillon, pathway, limite=4): next_ = maillon.get(pathway[-1], None) #next_ = We add a link to the existing chain according to the last protein of the pathway if next_ is None or len(pathway) >= limite : #If there is no protein found interacting with the last protein of the pathway, or if the size exceeds the limit then we move to the next chain yield pathway else: #If we still find proteins interacting with the last protein of the pathway and if the size limit is not reached for m in next_: # for an interaction in the set of possible pathway interactions [-1] if m not in pathway: #to avoid ending up with repetitions of the same proteins in the same chain yield from chain(maillon, pathway + [m]) for k in dd: # For each prot of the dico for z in chain(dd, pathway = [k]): print (' '.join(z))and here is the output of my script:
defaultdict(<class 'set'>, {'1': {'6', '4'}, '4': {'1', '9'}, '6': {'1', '9'}, '9': {'6', '33', '66', '4'}, '33': {'9'}, '66': {'9'}})
1 6 9 33
1 6 9 66
1 6 9 4
1 4 9 6
1 4 9 33
1 4 9 66
4 1 6 9
4 9 6 1
6 1 4 9
6 9 4 1
9 6 1 4
9 4 1 6
33 9 6 1
33 9 4 1
66 9 6 1
66 9 4 1
As you can see, the algorithm gives me the chains of interactions of size 4.
However, all chains of intermediate interactions (which are not of size 4 because they are blocked before) are missing, for example:
66 9 33
33 9 66
9 33
...
It must not be much to change, but it's been several days that I'm on it without success ...
If time permits, I will be very grateful for any help ...
If you have questions or need clarification, do not hesitate
Thank you