Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Optimizing a code
#1
Hello everyone,

I create an algorithm that will give me the path of the shortest interactions between 2 proteins, I explain:

I have a first file that contains the pairs of proteins to test (ex):

prot1 prot2 (the string will start with prot1 and end with prot2)

prot3 prot4

ect ..

My second file is a binary interactions file between different proteins, eg:

prot15 prot20

prot3 prot1

prot100 prot21

prot1 prot16

prot16 prot2

ect ...

for example here the shortest path between prot1 and prot2 is: prot1 prot16 prot2

Here is my code:

import sys
sys.setrecursionlimit(1000000)
from collections import defaultdict
dd = defaultdict(set)
P=[]
L=[]
 
#File containing the test proteins, I put all the couples in a list
with open("C:/Users/lveillat/Desktop/Chaines minimales/3080_interactions_pour_736.txt","r") as f0:
    for lignes0 in f0:
        P.append(lignes0.rstrip("\n").split(" ")[:2])

 
#File containing all the interactions, I make a dictionary with a protein key and value all the proteins with which it interacts
with open("736(int_connues_avec_scores_sup_0).txt","r") as f1:
    for lignes in f1:
        if lignes.startswith('9606'):
            lignes=lignes.rstrip('\n').split(" ")
            prot1=lignes[2]
            prot2=lignes[3]
            dd[prot1].add(prot2)
    #print(dd)
 
#Function allowing me to build interchange channels
def chain(proteine1, proteine2, maillon, pathway, limite=10):
    next_= maillon.get(pathway[-1],None)
 
    if len(pathway) < limite  :
        for m in next_:
            if m not in pathway:
 
                if m != proteine2:
                    yield from chain(proteine1, proteine2, maillon, pathway + [m])
 
                elif pathway[0]==proteine1::
                        pathway.append(m)
                        yield pathway
 
#For my protein couples to study
for c in P:
    #print(c)
    proteine1=c[0]
    proteine2=c[1]
    L.clear()
    print("The first protein in the chain is", proteine1)
    print("The last protein in the chain is", proteine2)
    print("")
    print("Minimal size chain(s):")
    print("")
 
    #I put in a list all possible chains of interactions starting with protein1 and ending with protein2
    for k in dd:
        for z in chain(proteine1,proteine2, dd, pathway = [k]):
            if z not in L:
                L.append(z)

 
    # I display the smallest elements of the list = the shortest chains
    
            min_len=min(len(z) for z in L)
            mins=[z for z in L if len(z) == min_len]

    for chaine in mins: 
        if chaine[-1]==proteine2:
            print(' '.join(chaine))
        else:
            print('ERROR - NO CHAINS BELOW THE LIMIT', )
 
    print("")
    print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
    print("")
The problem is that it is extremely slow ... are there lines that you think can be optimized and save me computing time?

Thank you :)
Reply


Messages In This Thread
Optimizing a code - by Amniote - Jul-11-2019, 02:11 PM
RE: Optimizing a code - by Gribouillis - Jul-11-2019, 03:48 PM
RE: Optimizing a code - by Amniote - Jul-11-2019, 04:08 PM
RE: Optimizing a code - by perfringo - Jul-11-2019, 03:57 PM
RE: Optimizing a code - by Gribouillis - Jul-11-2019, 05:33 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  do you have an idea for optimizing this code? netanelst 3 1,365 May-22-2022, 10:30 AM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020