Python Forum
Looping through dictionary and comparing values with elements of a separate list.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Looping through dictionary and comparing values with elements of a separate list.
#1
Apologies for the long question. Basically I am trying to loop through a dictionary I've constructed and check whether a specific element of the hash is in a given list.

Test script:
Hash_Isolates={
	"1" : ['L02476-16_P_R1', 'AE006468', '873'],
	"2" : ['AE006468', 'AE006468', '40'],
	"3" : ['AE006468', 'L02476-16_P_R1', '756'],
	"4" : ['L00409-17_R1', 'L02476-16_P_R1', '987'],
	"5" : ['L00817-17_R1', 'AE006468', '65']
}

new_isolateList=['AE006468', 'L00817-17_R1']

my_Isolates=[]

for i in Hash_Isolates:
	if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
		my_Isolates.append(Hash_Isolates[i])

print(len(my_Isolates))
Strangely, when I run this test script it works, but when I run the proper script it doesn't.
For the test script you get 2 printed out.

#!/usr/bin/env python2.7

import getpass
import sys
import re


isolateFile=sys.argv[1]
snapper_data=sys.argv[2]

## Get the user ID ##
def get_User():
    currentUser = getpass.getuser()
    return currentUser


isolatePath='/home/'+get_User()+'/path/to/file/'+isolateFile
dataPath='/home/'+get_User()+'/path/to/file/'+snapper_data


# Retrieve isolates from file

isolateList=[]
with open(isolatePath, 'r') as file:
	isolateList=file.readlines()

new_isolateList=[]
for i in isolateList:
	try:
	    x=re.search('(\w.....-?.?.?\d?)', str(i)).group(1)
	except:
		pass
	new_isolateList.append(x)

all_results=[]

with open(dataPath, 'r') as file:
	all_results=file.readlines()


# w is the position in the list of the samples being compared from the whole file
# x is first sample in comparison
# y is the second sample in comparison
# z is the SNP distance between the first and second samples
Hash_Isolates={}
for i in all_results:
	w=re.search('(.?.?.?.?.?.?.?),.+,.+,\d+\n', str(i)).group(1)
	x=re.search('.?.?.?.?.?.?.?,(.+),.+,\d+\n', str(i)).group(1)
	y=re.search('.?.?.?.?.?.?.?,.+,(.+),\d+\n', str(i)).group(1)
	z=re.search('.?.?.?.?.?.?.?,.+,.+,(\d+)\n', str(i)).group(1)
	Hash_Isolates[w]=[x, y, z]

my_Isolates=[]

for i in Hash_Isolates:
	if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
		my_Isolates.append(Hash_Isolates[i])


print(len(my_Isolates))
So I expect this to work in the same way, but it prints 0. This snapper_data file has 100k + lines.
The data looks like this:
The isolateFile is a text file:

L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1

The snapper_data file is csv file:

1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832

I'm really desperate to get command of using dictionaries but this is bugging me.
Reply
#2
well, it looks like you overcomplicate things.

import csv

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if len(set(line[1:-1]) & new_isolate)==2]
             
print(my_isolates)
snapper_data.csv
Output:
1,L02476-16_P_R1,AE006468,873 2,L02476-16_P_R1,L02888-16_P_R1,2 3,L02476-16_P_R1,L00541-14_P_R1,914 4,L02476-16_P_R1,L02471-16_P_R1,842 5,AE006468,L02888-16_P_R1,832 6,L01121-17_R1,AE006468,100
isolate_file.txt
Output:
L01121-17_R1 AE006468 L00817-17_R1 L00665-17_R1
output:
Output:
[['L01121-17_R1', 'AE006468', '100']]
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
You see this is why I like Python. So many simpler ways of doing things you just have to know them. Now I've got to decipher what you've written. Thanks.

 
len(set(line[1:-1]) & new_isolate)==2]
Hmm, this doesn't take into account the instances where the 2 elements being compared are the same.
Reply
#4
(Jun-22-2018, 01:12 PM)Mr_Keystrokes Wrote: Hmm, this doesn't take into account the instances where the 2 elements being compared are the same.

let's take the check in a function
import csv

def check_line(line, isolate):
    my_set = set(line)
    return len(my_set & isolate) == len(my_set)

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if check_line(line[1:-1], new_isolate)]
             
print(my_isolates)
you can do it also like this
import csv

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if line[1] in new_isolate and line[2] in new_isolate]
             
print(my_isolates)    
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
Thanks, I like the last solution best. Didn't know about csv reader so that will be useful in the future. And I would never have looked up set(). I have to say it's much simpler than Perl.
Reply
#6
(Jun-22-2018, 02:49 PM)Mr_Keystrokes Wrote: I have to say it's much simpler than Perl.
No joking? :D
“Perl – The only language that looks the same before and after RSA encryption.”
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  List of dataframe values beginning with x,y or z glidecode 3 206 Nov-08-2021, 10:16 PM
Last Post: glidecode
  Class-Aggregation and creating a list/dictionary IoannisDem 1 282 Oct-03-2021, 05:16 PM
Last Post: Yoriz
  How to pass list of values to a API request URL chetansaip99 0 591 Sep-28-2021, 07:37 AM
Last Post: chetansaip99
  Add elements to a Dictionary muzikman 12 867 Sep-10-2021, 03:17 PM
Last Post: muzikman
  Why am I getting list elements < 0 ? Mark17 8 773 Aug-26-2021, 09:31 AM
Last Post: naughtyCat
  Looping through nested elements and updating the original list Alex_James 3 476 Aug-19-2021, 12:05 PM
Last Post: Alex_James
  Extracting Elements From A Website List knight2000 2 568 Jul-20-2021, 10:38 AM
Last Post: knight2000
  Make Groups with the List Elements quest 2 598 Jul-11-2021, 09:58 AM
Last Post: perfringo
  I cannot delete and the elements from the list quest 4 1,070 May-11-2021, 12:01 PM
Last Post: perfringo
  List of lists - merge sublists with common elements medatib531 1 1,174 May-09-2021, 07:49 AM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020