Jun-22-2018, 12:05 PM
Apologies for the long question. Basically I am trying to loop through a dictionary I've constructed and check whether a specific element of the hash is in a given list.
Test script:
For the test script you get 2 printed out.
The data looks like this:
The isolateFile is a text file:
L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1
The snapper_data file is csv file:
1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832
I'm really desperate to get command of using dictionaries but this is bugging me.
Test script:
Hash_Isolates={ "1" : ['L02476-16_P_R1', 'AE006468', '873'], "2" : ['AE006468', 'AE006468', '40'], "3" : ['AE006468', 'L02476-16_P_R1', '756'], "4" : ['L00409-17_R1', 'L02476-16_P_R1', '987'], "5" : ['L00817-17_R1', 'AE006468', '65'] } new_isolateList=['AE006468', 'L00817-17_R1'] my_Isolates=[] for i in Hash_Isolates: if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList: my_Isolates.append(Hash_Isolates[i]) print(len(my_Isolates))Strangely, when I run this test script it works, but when I run the proper script it doesn't.
For the test script you get 2 printed out.
#!/usr/bin/env python2.7 import getpass import sys import re isolateFile=sys.argv[1] snapper_data=sys.argv[2] ## Get the user ID ## def get_User(): currentUser = getpass.getuser() return currentUser isolatePath='/home/'+get_User()+'/path/to/file/'+isolateFile dataPath='/home/'+get_User()+'/path/to/file/'+snapper_data # Retrieve isolates from file isolateList=[] with open(isolatePath, 'r') as file: isolateList=file.readlines() new_isolateList=[] for i in isolateList: try: x=re.search('(\w.....-?.?.?\d?)', str(i)).group(1) except: pass new_isolateList.append(x) all_results=[] with open(dataPath, 'r') as file: all_results=file.readlines() # w is the position in the list of the samples being compared from the whole file # x is first sample in comparison # y is the second sample in comparison # z is the SNP distance between the first and second samples Hash_Isolates={} for i in all_results: w=re.search('(.?.?.?.?.?.?.?),.+,.+,\d+\n', str(i)).group(1) x=re.search('.?.?.?.?.?.?.?,(.+),.+,\d+\n', str(i)).group(1) y=re.search('.?.?.?.?.?.?.?,.+,(.+),\d+\n', str(i)).group(1) z=re.search('.?.?.?.?.?.?.?,.+,.+,(\d+)\n', str(i)).group(1) Hash_Isolates[w]=[x, y, z] my_Isolates=[] for i in Hash_Isolates: if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList: my_Isolates.append(Hash_Isolates[i]) print(len(my_Isolates))So I expect this to work in the same way, but it prints 0. This snapper_data file has 100k + lines.
The data looks like this:
The isolateFile is a text file:
L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1
The snapper_data file is csv file:
1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832
I'm really desperate to get command of using dictionaries but this is bugging me.