Python Forum
Looping through dictionary and comparing values with elements of a separate list. - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Looping through dictionary and comparing values with elements of a separate list. (/thread-11096.html)



Looping through dictionary and comparing values with elements of a separate list. - Mr_Keystrokes - Jun-22-2018

Apologies for the long question. Basically I am trying to loop through a dictionary I've constructed and check whether a specific element of the hash is in a given list.

Test script:
Hash_Isolates={
	"1" : ['L02476-16_P_R1', 'AE006468', '873'],
	"2" : ['AE006468', 'AE006468', '40'],
	"3" : ['AE006468', 'L02476-16_P_R1', '756'],
	"4" : ['L00409-17_R1', 'L02476-16_P_R1', '987'],
	"5" : ['L00817-17_R1', 'AE006468', '65']
}

new_isolateList=['AE006468', 'L00817-17_R1']

my_Isolates=[]

for i in Hash_Isolates:
	if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
		my_Isolates.append(Hash_Isolates[i])

print(len(my_Isolates))
Strangely, when I run this test script it works, but when I run the proper script it doesn't.
For the test script you get 2 printed out.

#!/usr/bin/env python2.7

import getpass
import sys
import re


isolateFile=sys.argv[1]
snapper_data=sys.argv[2]

## Get the user ID ##
def get_User():
    currentUser = getpass.getuser()
    return currentUser


isolatePath='/home/'+get_User()+'/path/to/file/'+isolateFile
dataPath='/home/'+get_User()+'/path/to/file/'+snapper_data


# Retrieve isolates from file

isolateList=[]
with open(isolatePath, 'r') as file:
	isolateList=file.readlines()

new_isolateList=[]
for i in isolateList:
	try:
	    x=re.search('(\w.....-?.?.?\d?)', str(i)).group(1)
	except:
		pass
	new_isolateList.append(x)

all_results=[]

with open(dataPath, 'r') as file:
	all_results=file.readlines()


# w is the position in the list of the samples being compared from the whole file
# x is first sample in comparison
# y is the second sample in comparison
# z is the SNP distance between the first and second samples
Hash_Isolates={}
for i in all_results:
	w=re.search('(.?.?.?.?.?.?.?),.+,.+,\d+\n', str(i)).group(1)
	x=re.search('.?.?.?.?.?.?.?,(.+),.+,\d+\n', str(i)).group(1)
	y=re.search('.?.?.?.?.?.?.?,.+,(.+),\d+\n', str(i)).group(1)
	z=re.search('.?.?.?.?.?.?.?,.+,.+,(\d+)\n', str(i)).group(1)
	Hash_Isolates[w]=[x, y, z]

my_Isolates=[]

for i in Hash_Isolates:
	if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
		my_Isolates.append(Hash_Isolates[i])


print(len(my_Isolates))
So I expect this to work in the same way, but it prints 0. This snapper_data file has 100k + lines.
The data looks like this:
The isolateFile is a text file:

L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1

The snapper_data file is csv file:

1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832

I'm really desperate to get command of using dictionaries but this is bugging me.


RE: Looping through dictionary and comparing values with elements of a separate list. - buran - Jun-22-2018

well, it looks like you overcomplicate things.

import csv

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if len(set(line[1:-1]) & new_isolate)==2]
             
print(my_isolates)
snapper_data.csv
Output:
1,L02476-16_P_R1,AE006468,873 2,L02476-16_P_R1,L02888-16_P_R1,2 3,L02476-16_P_R1,L00541-14_P_R1,914 4,L02476-16_P_R1,L02471-16_P_R1,842 5,AE006468,L02888-16_P_R1,832 6,L01121-17_R1,AE006468,100
isolate_file.txt
Output:
L01121-17_R1 AE006468 L00817-17_R1 L00665-17_R1
output:
Output:
[['L01121-17_R1', 'AE006468', '100']]



RE: Looping through dictionary and comparing values with elements of a separate list. - Mr_Keystrokes - Jun-22-2018

You see this is why I like Python. So many simpler ways of doing things you just have to know them. Now I've got to decipher what you've written. Thanks.

 
len(set(line[1:-1]) & new_isolate)==2]
Hmm, this doesn't take into account the instances where the 2 elements being compared are the same.


RE: Looping through dictionary and comparing values with elements of a separate list. - buran - Jun-22-2018

(Jun-22-2018, 01:12 PM)Mr_Keystrokes Wrote: Hmm, this doesn't take into account the instances where the 2 elements being compared are the same.

let's take the check in a function
import csv

def check_line(line, isolate):
    my_set = set(line)
    return len(my_set & isolate) == len(my_set)

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if check_line(line[1:-1], new_isolate)]
             
print(my_isolates)
you can do it also like this
import csv

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if line[1] in new_isolate and line[2] in new_isolate]
             
print(my_isolates)    



RE: Looping through dictionary and comparing values with elements of a separate list. - Mr_Keystrokes - Jun-22-2018

Thanks, I like the last solution best. Didn't know about csv reader so that will be useful in the future. And I would never have looked up set(). I have to say it's much simpler than Perl.


RE: Looping through dictionary and comparing values with elements of a separate list. - wavic - Jun-22-2018

(Jun-22-2018, 02:49 PM)Mr_Keystrokes Wrote: I have to say it's much simpler than Perl.
No joking? :D
“Perl – The only language that looks the same before and after RSA encryption.”