Python Forum
Looping through dictionary and comparing values with elements of a separate list.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Looping through dictionary and comparing values with elements of a separate list.
#1
Apologies for the long question. Basically I am trying to loop through a dictionary I've constructed and check whether a specific element of the hash is in a given list.

Test script:
Hash_Isolates={
	"1" : ['L02476-16_P_R1', 'AE006468', '873'],
	"2" : ['AE006468', 'AE006468', '40'],
	"3" : ['AE006468', 'L02476-16_P_R1', '756'],
	"4" : ['L00409-17_R1', 'L02476-16_P_R1', '987'],
	"5" : ['L00817-17_R1', 'AE006468', '65']
}

new_isolateList=['AE006468', 'L00817-17_R1']

my_Isolates=[]

for i in Hash_Isolates:
	if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
		my_Isolates.append(Hash_Isolates[i])

print(len(my_Isolates))
Strangely, when I run this test script it works, but when I run the proper script it doesn't.
For the test script you get 2 printed out.

#!/usr/bin/env python2.7

import getpass
import sys
import re


isolateFile=sys.argv[1]
snapper_data=sys.argv[2]

## Get the user ID ##
def get_User():
    currentUser = getpass.getuser()
    return currentUser


isolatePath='/home/'+get_User()+'/path/to/file/'+isolateFile
dataPath='/home/'+get_User()+'/path/to/file/'+snapper_data


# Retrieve isolates from file

isolateList=[]
with open(isolatePath, 'r') as file:
	isolateList=file.readlines()

new_isolateList=[]
for i in isolateList:
	try:
	    x=re.search('(\w.....-?.?.?\d?)', str(i)).group(1)
	except:
		pass
	new_isolateList.append(x)

all_results=[]

with open(dataPath, 'r') as file:
	all_results=file.readlines()


# w is the position in the list of the samples being compared from the whole file
# x is first sample in comparison
# y is the second sample in comparison
# z is the SNP distance between the first and second samples
Hash_Isolates={}
for i in all_results:
	w=re.search('(.?.?.?.?.?.?.?),.+,.+,\d+\n', str(i)).group(1)
	x=re.search('.?.?.?.?.?.?.?,(.+),.+,\d+\n', str(i)).group(1)
	y=re.search('.?.?.?.?.?.?.?,.+,(.+),\d+\n', str(i)).group(1)
	z=re.search('.?.?.?.?.?.?.?,.+,.+,(\d+)\n', str(i)).group(1)
	Hash_Isolates[w]=[x, y, z]

my_Isolates=[]

for i in Hash_Isolates:
	if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
		my_Isolates.append(Hash_Isolates[i])


print(len(my_Isolates))
So I expect this to work in the same way, but it prints 0. This snapper_data file has 100k + lines.
The data looks like this:
The isolateFile is a text file:

L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1

The snapper_data file is csv file:

1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832

I'm really desperate to get command of using dictionaries but this is bugging me.
Reply
#2
well, it looks like you overcomplicate things.

import csv

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if len(set(line[1:-1]) & new_isolate)==2]
             
print(my_isolates)
snapper_data.csv
Output:
1,L02476-16_P_R1,AE006468,873 2,L02476-16_P_R1,L02888-16_P_R1,2 3,L02476-16_P_R1,L00541-14_P_R1,914 4,L02476-16_P_R1,L02471-16_P_R1,842 5,AE006468,L02888-16_P_R1,832 6,L01121-17_R1,AE006468,100
isolate_file.txt
Output:
L01121-17_R1 AE006468 L00817-17_R1 L00665-17_R1
output:
Output:
[['L01121-17_R1', 'AE006468', '100']]
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
You see this is why I like Python. So many simpler ways of doing things you just have to know them. Now I've got to decipher what you've written. Thanks.

 
len(set(line[1:-1]) & new_isolate)==2]
Hmm, this doesn't take into account the instances where the 2 elements being compared are the same.
Reply
#4
(Jun-22-2018, 01:12 PM)Mr_Keystrokes Wrote: Hmm, this doesn't take into account the instances where the 2 elements being compared are the same.

let's take the check in a function
import csv

def check_line(line, isolate):
    my_set = set(line)
    return len(my_set & isolate) == len(my_set)

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if check_line(line[1:-1], new_isolate)]
             
print(my_isolates)
you can do it also like this
import csv

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if line[1] in new_isolate and line[2] in new_isolate]
             
print(my_isolates)    
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
Thanks, I like the last solution best. Didn't know about csv reader so that will be useful in the future. And I would never have looked up set(). I have to say it's much simpler than Perl.
Reply
#6
(Jun-22-2018, 02:49 PM)Mr_Keystrokes Wrote: I have to say it's much simpler than Perl.
No joking? :D
“Perl – The only language that looks the same before and after RSA encryption.”
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  unable to remove all elements from list based on a condition sg_python 3 373 Jan-27-2024, 04:03 PM
Last Post: deanhystad
  Dictionary in a list bashage 2 494 Dec-27-2023, 04:04 PM
Last Post: deanhystad
  filtering a list of dictionary as per given criteria jss 5 597 Dec-23-2023, 08:47 AM
Last Post: Gribouillis
  need to compare 2 values in a nested dictionary jss 2 797 Nov-30-2023, 03:17 PM
Last Post: Pedroski55
  Copying the order of another list with identical values gohanhango 7 1,062 Nov-29-2023, 09:17 PM
Last Post: Pedroski55
  Search Excel File with a list of values huzzug 4 1,147 Nov-03-2023, 05:35 PM
Last Post: huzzug
  Sort a list of dictionaries by the only dictionary key Calab 1 452 Oct-27-2023, 03:03 PM
Last Post: buran
Question mypy unable to analyse types of tuple elements in a list comprehension tomciodev 1 427 Oct-17-2023, 09:46 AM
Last Post: tomciodev
  for loops break when I call the list I'm looping through Radical 4 824 Sep-18-2023, 07:52 AM
Last Post: buran
  Comparing List values to get indexes Edward_ 7 1,083 Jun-09-2023, 04:57 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020