Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Similarity network
#1
Hello all,

I have to write a code to convert a .blastp file into a .sif one. For those of you that don't know what these are, a .blastp file look like that :

ProtA ProtB xx xxxxx xx xx.xxx E-value
ProtA ProtC xx xxxxx xx xx.xxx E-value
ProtD ProtE xx xxxxx xx xx.xxx E-value
.
.
.

and a sif one look like that :

ProtA <relationship type> ProtB
ProtC <relationship type> ProtA
ProtD <relationship type> ProtE ProtF ProtB
.
.
.

Prot being a protein

I need to ask for a S value, if that value is greater than the e-value, then those two protein are linked and I have to put them in the sif file as :
ProtA linked ProtB

I have started by creating a dictionary that store the first protein as a key and all the protein that are linked as a list in the value of the key.That worked well, but now I am stucked at writing the sif file, here are my code so far:

import re


namF = "xxxxxxxxxxx.blastp"
of = open(namF,'r')


Sif="Output.sif"
fo = open(Sif, 'w')

S = str(input('Choisissez le seuil  '))

dico={}

ligne = of.readline()
while (ligne != ""):
	ligne.rstrip('\n')
	L= ligne.split('\t')
	E= L[-1:]
	A=L[0]
	B=L[1]

	if E[0] <= S:
    
		if A in dico: 
			dico[A].append(B)
		else:
			dico[A]=list()
			dico[A].append(B)
              
		if B in dico: 
			dico[B].append(A)
		else:
			dico[B]=list()
			dico[B].append(A)
                 
    	for key in dico:
		fo.write(key+ ' lié '+ dico[key]+'\n')
	ligne= of.readline()
print(D)
Then I tried to put it in function as demanded :
import re


namF = "xxxxxxxxxxx.blastp"
of = open(namF,'r')


Sif="Output.sif"
fo = open(Sif, 'w')

S = str(input('Choisissez le seuil  '))


def sif(S):

		for key in dico:
			fo.write(key+ ' lié '+ dico[key]+'\n')


def dic():
	dico={}

	ligne = of.readline()
	while (ligne != ""):
		ligne.rstrip('\n')
		L= ligne.split('\t')
		E= L[-1:]
		A=L[0]
		B=L[1]

		if E[0] <= S:

			if A in dico: 
				dico[A].append(B)
			else:
				dico[A]=list()
				dico[A].append(B)
              
			if B in dico: 
				dico[B].append(A)
			else:
				dico[B]=list()
				dico[B].append(A)
		ligne= of.readline()

print(D)
I tried to use my dic function in the sif one to create my output.sif but I can't manage to do it. Any help ?

Thanks
Reply
#2
At the end of dic(), you need to return dico, then in sif() you can write
for key, value in dic().items():
    ...
Also don't forget to call the sif() function.
Reply
#3
Thanks a lot for the help !

I tried to finish the function, first I tried to print:

def sif(S):
	for key, value in dic().items():
			print(key,value)
It worked well, here is a few lines in the terminal :

CJA_0908 ['YC6258_00210', 'YC6258_00752', 'YC6258_00896', 'YC6258_00996', 'YC6258_01573', 'YC6258_02031', 'YC6258_02324', 'YC6258_03001', 'YC6258_03399', 'YC6258_04343', 'YC6258_04411', 'YC6258_04506', 'YC6258_04610', 'YC6258_04632', 'YC6258_04632', 'YC6258_05081']
YC6258_05655 ['CJA_0941', 'CJA_0017']
CJA_3636 ['YC6258_01391', 'YC6258_03115', 'YC6258_04565']
CJA_2172 ['YC6258_02942', 'YC6258_04456']
CJA_0705 ['YC6258_04451']
CJA_2312 ['YC6258_02700', 'YC6258_04206', 'YC6258_05879']
YC6258_04757 ['CJA_1954', 'CJA_0230', 'CJA_0040']
CJA_1680 ['YC6258_04219']
CJA_0768 ['YC6258_00482', 'YC6258_01334', 'YC6258_01763', 'YC6258_04120']
YC6258_03292 ['CJA_0541', 'CJA_1040', 'CJA_2017', 'CJA_1630']
CJA_2751 ['YC6258_01150', 'YC6258_04413', 'YC6258_04841']

So I tried next to put it in my output.sif with this code :

def sif(S):
	for key, value in dic().items():
			L = value 
			for item in L:
				prot = ""
				prot = prot + item 
			prot = prot +"\n"
			fo.write(key+"<chain>"+prot)
It is a combination of a lot of try but in the end it comes out as a nonsens, here is a few lines:

YC6258_01638<chain>CJA_3820
YC6258_01965<chain>CJA_0747
YC6258_00851<chain>CJA_0422
CJA_0668<chain>YC6258_05031
CJA_1785<chain>YC6258_04983
CJA_0104<chain>YC6258_05903
YC6258_03597<chain>CJA_2152
YC6258_00918<chain>CJA_2555
YC6258_04616<chain>CJA_0025
YC6258_00551<chain>CJA_3159
YC6258_01798<chain>CJA_3676
CJA_1060<chain>YC6258_05567
CJA_2094<chain>YC6258_03387
YC6258_01909<chain>CJA_0667
YC6258_00274<chain>CJA_1830

DO you have an idea ?
Reply
#4
In the function dic you can use a defaultdict.
The difference to a normal dict is, that the defaultdict returns the default type, if the key don't exist.
from collections import defaultdict
dd = defaultdict(list)
dd['SOME_KEY'].append(42)
print(dd)
Your dic function refactored with defaultdict and using iteration together with tuple unpacking.
The _* consumes all items, which are left over.

from collections import defaultdict


def dic(input_file, S):
    result = defaultdict(list)
    with open(input_file) as fd:
        for line in fd:
            A, B, *_, E = list(element.strip() for element in line.split())
            if float(E) <= S:
                result[A].append(B)
                result[B].append(A)
        
    return result
The names A, B, E and S should be changed to a more descriptive name.
Just name them what they are, for example: left_protein, right_protein, threshold, (e_value?).

Is this output right?
In [19]: dd = dic('input_file.txt', 30) # some testdata
In [21]: for protein_left, protein_list in dd.items(): 
    ...:     for protein_right in protein_list: 
    ...:         print(protein_left, '<==>', protein_right) 
    ...:                                                                        
ProtA <==> ProtB
ProtA <==> ProtC
ProtB <==> ProtA
ProtC <==> ProtA
ProtD <==> ProtE
ProtE <==> ProtD
PS: Post some real data corresponding to the right output data.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Similarity function for couple system sunnydayxo 1 2,084 Apr-16-2021, 07:11 AM
Last Post: MH90000
  How do I improve string similarity in my current code? SUGSKY 3 2,321 May-28-2020, 05:16 AM
Last Post: deanhystad
  fingerprint similarity audio microphone alessandro87gatto 1 2,366 May-03-2019, 01:33 PM
Last Post: alessandro87gatto

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020