Looping through dictionary and comparing values with elements of a separate list.

Mr_Keystrokes · Jun-22-2018, 12:05 PM

Apologies for the long question. Basically I am trying to loop through a dictionary I've constructed and check whether a specific element of the hash is in a given list.

Test script:

        
              Hash_Isolates={
    "1" : ['L02476-16_P_R1', 'AE006468', '873'],
    "2" : ['AE006468', 'AE006468', '40'],
    "3" : ['AE006468', 'L02476-16_P_R1', '756'],
    "4" : ['L00409-17_R1', 'L02476-16_P_R1', '987'],
    "5" : ['L00817-17_R1', 'AE006468', '65']
}
 
new_isolateList=['AE006468', 'L00817-17_R1']
 
my_Isolates=[]
 
for i in Hash_Isolates:
    if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
        my_Isolates.append(Hash_Isolates[i])
 
print(len(my_Isolates))

Strangely, when I run this test script it works, but when I run the proper script it doesn't.
For the test script you get 2 printed out.

        
          
          
              
              #!/usr/bin/env python2.7
 
import getpass
import sys
import re
 
 
isolateFile=sys.argv[1]
snapper_data=sys.argv[2]
 
## Get the user ID ##
def get_User():
    currentUser = getpass.getuser()
    return currentUser
 
 
isolatePath='/home/'+get_User()+'/path/to/file/'+isolateFile
dataPath='/home/'+get_User()+'/path/to/file/'+snapper_data
 
 
# Retrieve isolates from file
 
isolateList=[]
with open(isolatePath, 'r') as file:
    isolateList=file.readlines()
 
new_isolateList=[]
for i in isolateList:
    try:
        x=re.search('(\w.....-?.?.?\d?)', str(i)).group(1)
    except:
        pass
    new_isolateList.append(x)
 
all_results=[]
 
with open(dataPath, 'r') as file:
    all_results=file.readlines()
 
 
# w is the position in the list of the samples being compared from the whole file
# x is first sample in comparison
# y is the second sample in comparison
# z is the SNP distance between the first and second samples
Hash_Isolates={}
for i in all_results:
    w=re.search('(.?.?.?.?.?.?.?),.+,.+,\d+\n', str(i)).group(1)
    x=re.search('.?.?.?.?.?.?.?,(.+),.+,\d+\n', str(i)).group(1)
    y=re.search('.?.?.?.?.?.?.?,.+,(.+),\d+\n', str(i)).group(1)
    z=re.search('.?.?.?.?.?.?.?,.+,.+,(\d+)\n', str(i)).group(1)
    Hash_Isolates[w]=[x, y, z]
 
my_Isolates=[]
 
for i in Hash_Isolates:
    if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
        my_Isolates.append(Hash_Isolates[i])
 
 
print(len(my_Isolates))

            

        
      

So I expect this to work in the same way, but it prints 0. This snapper_data file has 100k + lines.
The data looks like this:
The isolateFile is a text file:

L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1

The snapper_data file is csv file:

1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832

I'm really desperate to get command of using dictionaries but this is bugging me.

**buran** · Jun-22-2018, 12:52 PM

well, it looks like you overcomplicate things.

        
              import csv
 
with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}
 
with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if len(set(line[1:-1]) & new_isolate)==2]
              
print(my_isolates)

snapper_data.csv

Output:1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832
6,L01121-17_R1,AE006468,100

isolate_file.txt

Output:L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1

output:

Output:
[['L01121-17_R1', 'AE006468', '100']]

Mr_Keystrokes · (This post was last modified: Jun-22-2018, 01:28 PM by Mr_Keystrokes.)

You see this is why I like Python. So many simpler ways of doing things you just have to know them. Now I've got to decipher what you've written. Thanks.

        
len(set(line[1:-1]) & new_isolate)==2]

Hmm, this doesn't take into account the instances where the 2 elements being compared are the same.

**buran** · Jun-22-2018, 01:46 PM

(Jun-22-2018, 01:12 PM)Mr_Keystrokes Wrote: Hmm, this doesn't take into account the instances where the 2 elements being compared are the same.

let's take the check in a function

        
              import csv
 
def check_line(line, isolate):
    my_set = set(line)
    return len(my_set & isolate) == len(my_set)
 
with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}
 
with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if check_line(line[1:-1], new_isolate)]
              
print(my_isolates)

you can do it also like this

        
              import csv
 
with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}
 
with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if line[1] in new_isolate and line[2] in new_isolate]
              
print(my_isolates)

Mr_Keystrokes · Jun-22-2018, 02:49 PM

Thanks, I like the last solution best. Didn't know about csv reader so that will be useful in the future. And I would never have looked up set(). I have to say it's much simpler than Perl.

wavic · Jun-22-2018, 03:08 PM

(Jun-22-2018, 02:49 PM)Mr_Keystrokes Wrote: I have to say it's much simpler than Perl.

No joking? :D
“Perl – The only language that looks the same before and after RSA encryption.”

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Replace values in Yaml file with value in dictionary	PelleH	1	2,324	Feb-11-2025, 09:51 AM Last Post: alexjordan
	Assigning cycle values in a list	nmancini	3	1,098	Sep-16-2024, 09:35 PM Last Post: deanhystad
	remove duplicates from dicts with list values	wardancer84	27	6,268	May-27-2024, 04:54 PM Last Post: wardancer84
	Sort a list of dictionaries by the only dictionary key	Calab	2	1,528	Apr-29-2024, 04:38 PM Last Post: Calab
	Using Lists as Dictionary Values	bfallert	8	2,456	Apr-21-2024, 06:55 AM Last Post: Pedroski55
	unable to remove all elements from list based on a condition	sg_python	3	1,776	Jan-27-2024, 04:03 PM Last Post: deanhystad
	Dictionary in a list	bashage	2	1,479	Dec-27-2023, 04:04 PM Last Post: deanhystad
	filtering a list of dictionary as per given criteria	jss	5	1,860	Dec-23-2023, 08:47 AM Last Post: Gribouillis
	need to compare 2 values in a nested dictionary	jss	2	1,855	Nov-30-2023, 03:17 PM Last Post: Pedroski55
	Copying the order of another list with identical values	gohanhango	7	2,785	Nov-29-2023, 09:17 PM Last Post: Pedroski55

Looping through dictionary and comparing values with elements of a separate list.

User Panel Messages

Announcements