Python Forum
finding items/comparison in/with a dictionary
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
finding items/comparison in/with a dictionary
#1
Hello,
I have file 1 with this kind of information: Name Definition for about 2,000 names --> I have created a dictionary with this file (name:definition)

I have file 2 with this kind of information: >Name \n Information about name for the same 2,000 names

The goal is to add the definition of file 1 to the correct name in file 2: >Name Definition \n Information

I generated the code below where it is supposed to verify if the name is the same as in the dictionary:
if line(name) in dict.keys() and it doesn't seem to work. I have printed separately line(name) and dict.keys() and they are the same list of characters,except for that when I print the dictionary it adds brackets as in ['name'].

(I also know that I make my code more complicated than it should be ... I need to learn how to generate functions!)
file1= open('c:/python27/annotation.txt', 'r')
dict={}
file2= open('c:/python27/protein_file.faa', 'r')
outfile=open('c:/python27/proteinandannotation.faa', 'w')
for line in file1:
	name=line.strip().split()
	dict={name[0]:name[1:]}
	for line in file2:
		if line.startswith('>'):
#		line=line.strip
			if line[1:28] in dict.keys():
				line=line.strip('\n')
				outfile.write(line + dict.values() + '\n')
		else:
			outfile.write(line)
Reply
#2
Print what you are comparing so you know what is happening. Also Python uses dict already so name your dictionaries something else.
print("\n", line[1:28])
print(dict.keys)
if line[1:28] in dict.keys():  
Reply
#3
Thanks!
I figured that I had to strip the return line character at the end;
THis is my new code:
for key in annotation:
		for line in file2:
			if line.startswith('>'):
				gene=line[1:]
				gene=gene.strip('\n')
				if gene == key:
Reply
#4
Hello,
I have file 1 with this kind of information: "ID" "Definition" for about 2,000 IDs --> I have created a dictionary with this file (id:definition) --> dictionary is called annotation(name1:name2)

I have file 2 with this kind of information: ">" + "ID" + \n + "Information about ID" for the same 2,000 IDs

The goal is to add the definition of file 1 to the correct ID in file 2: ">" + "ID" + "Definition" + \n + "Information about ID"

I generated the code below where it is supposed to verify if the name is the same as in the dictionary:
however it only checks the first entry in the dictionary.

file1= open('c:/python27/annotation.txt', 'r')  #this has the ID and definition
annotation={}  #define the dictionary
file2= open('c:/python27/protein_file.faa', 'r') #this has the ID with information
outfile=open('c:/python27/proteinandannotation.faa', 'w') 
for line in file1: #here I populate the dictionary with ID:definition
	name=line.strip().split()
	annotation={name[0]:name[1:]}
	print annotation #the entire dictionary prints, so it is all there
	for key in annotation: #here I go through every key in the dictionary, but it only seems to go through 1
		print key #all the keys print
        for line in file2: #will check this file for the first line of each entry with the ">" +"ID" format
			print key #only the first key prints now
            if line.startswith('>'): #here I make sure that I am on the line with ID 
				gene=line[1:] #here I extract ID from line
				gene=gene.strip('\n') #and get rid of line return
				if gene == key: #here I compare the ID to the key value in the dictionary; it works well for 1
					line=line.strip('\n')
					outfile.write(line + str(annotation[key]) + '\n') #here I write the ID with the definition
			else:
				outfile.write(line) #If the line doesn't have an Id and it is just the information I just write it
Reply
#5
on line#7 with each iteration you overwrite existing annotation dict with a new one with single element
Apart from that your approach is ineffective. better approach would be to loop trough file 1 and create a dict.
then iterate over file 2, reading 2 lines (there are different possible approaches to do this), parse them to extract ID and info, then just get the respective element from the dict, created from file 1 and write to new file (file 3)
Not to mention that lines 19-20 will print lines that not start with > many many times
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#6
Can you upload sample of the files?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#7
Thanks for your time and help.

Here is a bit of file 1 with ID plus definitions:
fig|6666666.213038.peg.1 Name=Leucyl-tRNA synthetase (EC 6.1.1.4) Ontology_term=KEGG_ENZYME:6.1.1.4
fig|6666666.213038.peg.2 Name=hypothetical protein
fig|6666666.213038.peg.3 Name=peptidoglycan-associated lipoprotein%2C OmpA family
fig|6666666.213038.peg.4 Name=Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4) Ontology_term=KEGG_ENZYME:3.1.22.4
fig|6666666.213038.peg.5 Name=FIG000859: hypothetical protein YebC
fig|6666666.213038.peg.6 Name=Phage protein

A bit of file 2, with >ID plus info about ID (this is a biology file)
>fig|6666666.213038.peg.1
MQPNGVDLYVGGAEHAVLHLLYARFWHKVLYDYGYVSTPEPFYKLFHQGLILGEDGRKMS
KSLGNVVNPDEVVQKYGADSLRMFEMFLGPLQDTKPWSTTGIEGINRFLNRIWRLFIDEN
GNLNPNIEDLPLTPEQEYILHSTIKKVTEDIENLRFNTAIAQMMIFVNEFYKFEKKPKEA
LKKFLLILAPFAPHISEELWHKLGYSESIFTYSFPEFDEHKAIKKEVEIVVQINSKIRAR
INVPIDTPENEVLDIAKSEPNVQKYLAGKEIRKVIFVPNKILNIII
>fig|6666666.213038.peg.2
MIIPFQFIQNLQKYNKNFCSKNIDKHFQNKFTREMSLSLYFEIYVKFFC
>fig|6666666.213038.peg.3
MIYRNXFTIIFAVIFLSVSLLSAKPNXKSSKTENASEQYFLWGXVIEDANSPNTPNTPTQ
DQPQIFDKSKIRIINLGPVVNWKGLDYAPTISADGKTLYFVSNRPGSKINPKTDKPSHDF
WATKKNDRLDTIFFKPFNIDTTTIWGYQGVNTPENEGVASIAADGQTLYFTACSRPDGFG
DCDIYKTTIEGDKWSRPYNLGPNVNSKYFDAQPSIAPDQSRLYFVSTRPGPNSDGNNENW
DNMDIWYSDFDPETEEWLPAKNLTEINTPVQDCGPFIAADNQTLFFSSKGHQPNYGGLDF
YVTRYDPVTKKWSKPENLGIPLNTPQDDQFITLPASGDVLYFSSRRKDIPGYQGDFDIFM
AFVPSYFRAVVVKTTVIDECSGENIPAIVTIKSPIINRVVVDTLKATRTEIDFVVSNTDY
GDPRDSIKFVNLEITAENPKYGKTTKIVRVDKPKPTTDPEEAKKYADVINVVIPLGQRPV
IGAEIEEAKYVAENKKIKPEIANFRGLVMEQFQTWDLYPLLNYVFFDAGSSKIPDRYILF
KSPDDKFKKAFTDTTIRGGTLEKYYHILNIYGYRLNKYPEAKIEIVGCNDGKTPEEKRPN
LSKERAEAVFKYLRDVWGIDEKRMKITVRNQPAVVSNLNDSLGIVENRRVEILCNDWNIM
KPVFDKDPKTFPQPETMNFTLKNGIEDALVKARRIEVKRGGREWNTLKDIGVVENKYTWD
WKSSEGEYPKDEVPFTAQLIVTTINDKECSSDPIMIPVMQVTTEQKKVDIQKGAKDSTIE
RYSLILFPFDRSDAGPINERIMREYVYNRVLPTSYVEVVGHTDVVGLYEHNQALSERRAT
TVYNGIMQQTKGKVGYINKRGVGEDEPLYDNSLPEGRFYNRTVQVIIKTPVESWEQLGGG
K
>fig|6666666.213038.peg.4
MLILGIDPGSVKCGFGIVDFEGFPPKIIKVGLIRPKFKDKFHFLDKLKFIYDELNSLLDY
FDIVETAVESQFYSKNPQSLMKLTQAKTVVELAMLNRNIPVFEYSPREIKLAITGRGGAT
KKSVQYMVESIFDVNLKNKTTDISDALAVALCHISRKVNLNTKRNSPRNWREFVQMNPER
VISQ

And this is what I get with my code. It works beautifully with the first entry, but then since it is only looking at the first key it never matches with the ID and doesn't print it

>fig|6666666.213038.peg.1['Name=Leucyl-tRNA', 'synthetase', '(EC', '6.1.1.4)', 'Ontology_term=KEGG_ENZYME:6.1.1.4']
MQPNGVDLYVGGAEHAVLHLLYARFWHKVLYDYGYVSTPEPFYKLFHQGLILGEDGRKMS
KSLGNVVNPDEVVQKYGADSLRMFEMFLGPLQDTKPWSTTGIEGINRFLNRIWRLFIDEN
GNLNPNIEDLPLTPEQEYILHSTIKKVTEDIENLRFNTAIAQMMIFVNEFYKFEKKPKEA
LKKFLLILAPFAPHISEELWHKLGYSESIFTYSFPEFDEHKAIKKEVEIVVQINSKIRAR
INVPIDTPENEVLDIAKSEPNVQKYLAGKEIRKVIFVPNKILNIII
MIIPFQFIQNLQKYNKNFCSKNIDKHFQNKFTREMSLSLYFEIYVKFFC
MIYRNXFTIIFAVIFLSVSLLSAKPNXKSSKTENASEQYFLWGXVIEDANSPNTPNTPTQ
DQPQIFDKSKIRIINLGPVVNWKGLDYAPTISADGKTLYFVSNRPGSKINPKTDKPSHDF
WATKKNDRLDTIFFKPFNIDTTTIWGYQGVNTPENEGVASIAADGQTLYFTACSRPDGFG
DCDIYKTTIEGDKWSRPYNLGPNVNSKYFDAQPSIAPDQSRLYFVSTRPGPNSDGNNENW
DNMDIWYSDFDPETEEWLPAKNLTEINTPVQDCGPFIAADNQTLFFSSKGHQPNYGGLDF
YVTRYDPVTKKWSKPENLGIPLNTPQDDQFITLPASGDVLYFSSRRKDIPGYQGDFDIFM
AFVPSYFRAVVVKTTVIDECSGENIPAIVTIKSPIINRVVVDTLKATRTEIDFVVSNTDY
GDPRDSIKFVNLEITAENPKYGKTTKIVRVDKPKPTTDPEEAKKYADVINVVIPLGQRPV
IGAEIEEAKYVAENKKIKPEIANFRGLVMEQFQTWDLYPLLNYVFFDAGSSKIPDRYILF
KSPDDKFKKAFTDTTIRGGTLEKYYHILNIYGYRLNKYPEAKIEIVGCNDGKTPEEKRPN
LSKERAEAVFKYLRDVWGIDEKRMKITVRNQPAVVSNLNDSLGIVENRRVEILCNDWNIM
KPVFDKDPKTFPQPETMNFTLKNGIEDALVKARRIEVKRGGREWNTLKDIGVVENKYTWD
WKSSEGEYPKDEVPFTAQLIVTTINDKECSSDPIMIPVMQVTTEQKKVDIQKGAKDSTIE
RYSLILFPFDRSDAGPINERIMREYVYNRVLPTSYVEVVGHTDVVGLYEHNQALSERRAT
TVYNGIMQQTKGKVGYINKRGVGEDEPLYDNSLPEGRFYNRTVQVIIKTPVESWEQLGGG
K
MLILGIDPGSVKCGFGIVDFEGFPPKIIKVGLIRPKFKDKFHFLDKLKFIYDELNSLLDY
FDIVETAVESQFYSKNPQSLMKLTQAKTVVELAMLNRNIPVFEYSPREIKLAITGRGGAT
KKSVQYMVESIFDVNLKNKTTDISDALAVALCHISRKVNLNTKRNSPRNWREFVQMNPER
VISQ
MSGHSKWANIKHKKAAKDAKRGKLFTRLAKEITIAAREGGGDPEANPRLRLAIQNAKAEN
MPMENIKRAIQRGTGEIQGENYEEVIYEGYAPLGVAVILEAITDNRNRTYPLIRSEVNKL
GGSIGEPGSVMWNFTRKGVIYIDPQGLTEEQMLEHILEAGCEDMEYDEERTRVICAFEDM
VACQKYFEDKKFKILESKFEYIPKTTVKIDNIEAARKVLKFFDTLEELDDVQNVYGNYEF
TDEVLSQLEKEQN
Reply
#8
do the files have headers?
Am I right that delimiter in first file is |?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#9
Nope, no headers.

I am not sure what you mean by delimiter. The first ID (key) would be: fig|6666666.213038.peg.1 and the first definition (value): Name=Leucyl-tRNA synthetase (EC 6.1.1.4) Ontology_term=KEGG_ENZYME:6.1.1.4
Reply
#10
Also, in the second file this:
[inline]MQPNGVDLYVGGAEHAVLHLLYARFWHKVLYDYGYVSTPEPFYKLFHQGLILGEDGRKMS
KSLGNVVNPDEVVQKYGADSLRMFEMFLGPLQDTKPWSTTGIEGINRFLNRIWRLFIDEN
GNLNPNIEDLPLTPEQEYILHSTIKKVTEDIENLRFNTAIAQMMIFVNEFYKFEKKPKEA
LKKFLLILAPFAPHISEELWHKLGYSESIFTYSFPEFDEHKAIKKEVEIVVQINSKIRAR
INVPIDTPENEVLDIAKSEPNVQKYLAGKEIRKVIFVPNKILNIII[/inline]
is a single line or multiple lines?
Can you attach the files?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Finding combinations of list of items (30 or so) LynnS 1 880 Jan-25-2023, 02:57 PM
Last Post: deanhystad
  how to assign items from a list to a dictionary CompleteNewb 3 1,586 Mar-19-2022, 01:25 AM
Last Post: deanhystad
  Calculating frequency of items in a dictionary markellefultz20 1 1,728 Nov-27-2019, 04:21 AM
Last Post: scidam
  Python find the minimum length of string to differentiate dictionary items zydjohn 3 3,627 Mar-03-2018, 05:23 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020