Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Looping through dictionary and comparing values with elements of a separate list.
#1
Apologies for the long question. Basically I am trying to loop through a dictionary I've constructed and check whether a specific element of the hash is in a given list.

Test script:
Hash_Isolates={
	"1" : ['L02476-16_P_R1', 'AE006468', '873'],
	"2" : ['AE006468', 'AE006468', '40'],
	"3" : ['AE006468', 'L02476-16_P_R1', '756'],
	"4" : ['L00409-17_R1', 'L02476-16_P_R1', '987'],
	"5" : ['L00817-17_R1', 'AE006468', '65']
}

new_isolateList=['AE006468', 'L00817-17_R1']

my_Isolates=[]

for i in Hash_Isolates:
	if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
		my_Isolates.append(Hash_Isolates[i])

print(len(my_Isolates))
Strangely, when I run this test script it works, but when I run the proper script it doesn't.
For the test script you get 2 printed out.

#!/usr/bin/env python2.7

import getpass
import sys
import re


isolateFile=sys.argv[1]
snapper_data=sys.argv[2]

## Get the user ID ##
def get_User():
    currentUser = getpass.getuser()
    return currentUser


isolatePath='/home/'+get_User()+'/path/to/file/'+isolateFile
dataPath='/home/'+get_User()+'/path/to/file/'+snapper_data


# Retrieve isolates from file

isolateList=[]
with open(isolatePath, 'r') as file:
	isolateList=file.readlines()

new_isolateList=[]
for i in isolateList:
	try:
	    x=re.search('(\w.....-?.?.?\d?)', str(i)).group(1)
	except:
		pass
	new_isolateList.append(x)

all_results=[]

with open(dataPath, 'r') as file:
	all_results=file.readlines()


# w is the position in the list of the samples being compared from the whole file
# x is first sample in comparison
# y is the second sample in comparison
# z is the SNP distance between the first and second samples
Hash_Isolates={}
for i in all_results:
	w=re.search('(.?.?.?.?.?.?.?),.+,.+,\d+\n', str(i)).group(1)
	x=re.search('.?.?.?.?.?.?.?,(.+),.+,\d+\n', str(i)).group(1)
	y=re.search('.?.?.?.?.?.?.?,.+,(.+),\d+\n', str(i)).group(1)
	z=re.search('.?.?.?.?.?.?.?,.+,.+,(\d+)\n', str(i)).group(1)
	Hash_Isolates[w]=[x, y, z]

my_Isolates=[]

for i in Hash_Isolates:
	if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
		my_Isolates.append(Hash_Isolates[i])


print(len(my_Isolates))
So I expect this to work in the same way, but it prints 0. This snapper_data file has 100k + lines.
The data looks like this:
The isolateFile is a text file:

L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1

The snapper_data file is csv file:

1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832

I'm really desperate to get command of using dictionaries but this is bugging me.
Quote
#2
well, it looks like you overcomplicate things.

import csv

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if len(set(line[1:-1]) & new_isolate)==2]
             
print(my_isolates)
snapper_data.csv
Output:
1,L02476-16_P_R1,AE006468,873 2,L02476-16_P_R1,L02888-16_P_R1,2 3,L02476-16_P_R1,L00541-14_P_R1,914 4,L02476-16_P_R1,L02471-16_P_R1,842 5,AE006468,L02888-16_P_R1,832 6,L01121-17_R1,AE006468,100
isolate_file.txt
Output:
L01121-17_R1 AE006468 L00817-17_R1 L00665-17_R1
output:
Output:
[['L01121-17_R1', 'AE006468', '100']]
Quote
#3
You see this is why I like Python. So many simpler ways of doing things you just have to know them. Now I've got to decipher what you've written. Thanks.

 
len(set(line[1:-1]) & new_isolate)==2]
Hmm, this doesn't take into account the instances where the 2 elements being compared are the same.
Quote
#4
(Jun-22-2018, 01:12 PM)Mr_Keystrokes Wrote: Hmm, this doesn't take into account the instances where the 2 elements being compared are the same.

let's take the check in a function
import csv

def check_line(line, isolate):
    my_set = set(line)
    return len(my_set & isolate) == len(my_set)

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if check_line(line[1:-1], new_isolate)]
             
print(my_isolates)
you can do it also like this
import csv

with open('isolate_file.txt') as f:
    new_isolate = {line.strip() for line in f}

with open('snapper_data.csv') as sd:
    rdr = csv.reader(sd)
    my_isolates = [line[1:] for line in rdr if line[1] in new_isolate and line[2] in new_isolate]
             
print(my_isolates)    
Quote
#5
Thanks, I like the last solution best. Didn't know about csv reader so that will be useful in the future. And I would never have looked up set(). I have to say it's much simpler than Perl.
Quote
#6
(Jun-22-2018, 02:49 PM)Mr_Keystrokes Wrote: I have to say it's much simpler than Perl.
No joking? :D
“Perl – The only language that looks the same before and after RSA encryption.”
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  access dictionary with keys from another and write values to list redminote4dd 6 148 Jun-03-2020, 05:20 PM
Last Post: DeaD_EyE
  Nested Dictionary/List tonybrown3 5 282 May-08-2020, 01:27 AM
Last Post: tonybrown3
  Transforming nested key-tuples into their dictionary values ClassicalSoul 4 249 Apr-11-2020, 04:36 PM
Last Post: bowlofred
  Grabbing comma separed values from SQLite and putting them in a list PythonNPC 8 311 Apr-10-2020, 02:39 PM
Last Post: buran
  dict elements are sometimes treated as List and sometimes as String phython_mdr 4 236 Apr-01-2020, 12:47 PM
Last Post: phython_mdr
  TypeError indexing a range of elements directly on the list JFerreira 2 223 Mar-30-2020, 04:22 PM
Last Post: bowlofred
  dot product of a list by a list in a dictionary stored as a value Fraher123 1 188 Mar-25-2020, 06:08 PM
Last Post: deanhystad
  Theory behind referencing a dictionary rather than copying it to a list sShadowSerpent 2 158 Mar-24-2020, 07:18 PM
Last Post: sShadowSerpent
  comparing 2 dimensional list glennford49 10 304 Mar-24-2020, 05:23 PM
Last Post: saikiran
  adding elements to a list that are more than a specific number Olavv 2 200 Mar-19-2020, 06:05 PM
Last Post: Olavv

Forum Jump:


Users browsing this thread: 1 Guest(s)