Similarity network - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Homework (https://python-forum.io/forum-9.html) +--- Thread: Similarity network (/thread-17291.html) |
Similarity network - Absolumentpasadrien - Apr-05-2019 Hello all, I have to write a code to convert a .blastp file into a .sif one. For those of you that don't know what these are, a .blastp file look like that : ProtA ProtB xx xxxxx xx xx.xxx E-value ProtA ProtC xx xxxxx xx xx.xxx E-value ProtD ProtE xx xxxxx xx xx.xxx E-value . . . and a sif one look like that : ProtA <relationship type> ProtB ProtC <relationship type> ProtA ProtD <relationship type> ProtE ProtF ProtB . . . Prot being a protein I need to ask for a S value, if that value is greater than the e-value, then those two protein are linked and I have to put them in the sif file as : ProtA linked ProtB I have started by creating a dictionary that store the first protein as a key and all the protein that are linked as a list in the value of the key.That worked well, but now I am stucked at writing the sif file, here are my code so far: import re namF = "xxxxxxxxxxx.blastp" of = open(namF,'r') Sif="Output.sif" fo = open(Sif, 'w') S = str(input('Choisissez le seuil ')) dico={} ligne = of.readline() while (ligne != ""): ligne.rstrip('\n') L= ligne.split('\t') E= L[-1:] A=L[0] B=L[1] if E[0] <= S: if A in dico: dico[A].append(B) else: dico[A]=list() dico[A].append(B) if B in dico: dico[B].append(A) else: dico[B]=list() dico[B].append(A) for key in dico: fo.write(key+ ' lié '+ dico[key]+'\n') ligne= of.readline() print(D)Then I tried to put it in function as demanded : import re namF = "xxxxxxxxxxx.blastp" of = open(namF,'r') Sif="Output.sif" fo = open(Sif, 'w') S = str(input('Choisissez le seuil ')) def sif(S): for key in dico: fo.write(key+ ' lié '+ dico[key]+'\n') def dic(): dico={} ligne = of.readline() while (ligne != ""): ligne.rstrip('\n') L= ligne.split('\t') E= L[-1:] A=L[0] B=L[1] if E[0] <= S: if A in dico: dico[A].append(B) else: dico[A]=list() dico[A].append(B) if B in dico: dico[B].append(A) else: dico[B]=list() dico[B].append(A) ligne= of.readline() print(D)I tried to use my dic function in the sif one to create my output.sif but I can't manage to do it. Any help ? Thanks RE: Similarity network - Gribouillis - Apr-05-2019 At the end of dic() , you need to return dico , then in sif() you can writefor key, value in dic().items(): ...Also don't forget to call the sif() function.
RE: Similarity network - Absolumentpasadrien - Apr-05-2019 Thanks a lot for the help ! I tried to finish the function, first I tried to print: def sif(S): for key, value in dic().items(): print(key,value)It worked well, here is a few lines in the terminal : CJA_0908 ['YC6258_00210', 'YC6258_00752', 'YC6258_00896', 'YC6258_00996', 'YC6258_01573', 'YC6258_02031', 'YC6258_02324', 'YC6258_03001', 'YC6258_03399', 'YC6258_04343', 'YC6258_04411', 'YC6258_04506', 'YC6258_04610', 'YC6258_04632', 'YC6258_04632', 'YC6258_05081'] YC6258_05655 ['CJA_0941', 'CJA_0017'] CJA_3636 ['YC6258_01391', 'YC6258_03115', 'YC6258_04565'] CJA_2172 ['YC6258_02942', 'YC6258_04456'] CJA_0705 ['YC6258_04451'] CJA_2312 ['YC6258_02700', 'YC6258_04206', 'YC6258_05879'] YC6258_04757 ['CJA_1954', 'CJA_0230', 'CJA_0040'] CJA_1680 ['YC6258_04219'] CJA_0768 ['YC6258_00482', 'YC6258_01334', 'YC6258_01763', 'YC6258_04120'] YC6258_03292 ['CJA_0541', 'CJA_1040', 'CJA_2017', 'CJA_1630'] CJA_2751 ['YC6258_01150', 'YC6258_04413', 'YC6258_04841'] So I tried next to put it in my output.sif with this code : def sif(S): for key, value in dic().items(): L = value for item in L: prot = "" prot = prot + item prot = prot +"\n" fo.write(key+"<chain>"+prot)It is a combination of a lot of try but in the end it comes out as a nonsens, here is a few lines: YC6258_01638<chain>CJA_3820 YC6258_01965<chain>CJA_0747 YC6258_00851<chain>CJA_0422 CJA_0668<chain>YC6258_05031 CJA_1785<chain>YC6258_04983 CJA_0104<chain>YC6258_05903 YC6258_03597<chain>CJA_2152 YC6258_00918<chain>CJA_2555 YC6258_04616<chain>CJA_0025 YC6258_00551<chain>CJA_3159 YC6258_01798<chain>CJA_3676 CJA_1060<chain>YC6258_05567 CJA_2094<chain>YC6258_03387 YC6258_01909<chain>CJA_0667 YC6258_00274<chain>CJA_1830 DO you have an idea ? RE: Similarity network - DeaD_EyE - Apr-05-2019 In the function dic you can use a defaultdict. The difference to a normal dict is, that the defaultdict returns the default type, if the key don't exist. from collections import defaultdict dd = defaultdict(list) dd['SOME_KEY'].append(42) print(dd)Your dic function refactored with defaultdict and using iteration together with tuple unpacking. The _* consumes all items, which are left over.from collections import defaultdict def dic(input_file, S): result = defaultdict(list) with open(input_file) as fd: for line in fd: A, B, *_, E = list(element.strip() for element in line.split()) if float(E) <= S: result[A].append(B) result[B].append(A) return resultThe names A , B , E and S should be changed to a more descriptive name.Just name them what they are, for example: left_protein, right_protein, threshold, (e_value?). Is this output right? In [19]: dd = dic('input_file.txt', 30) # some testdata In [21]: for protein_left, protein_list in dd.items(): ...: for protein_right in protein_list: ...: print(protein_left, '<==>', protein_right) ...: ProtA <==> ProtB ProtA <==> ProtC ProtB <==> ProtA ProtC <==> ProtA ProtD <==> ProtE ProtE <==> ProtDPS: Post some real data corresponding to the right output data. |