Jun-22-2018, 12:05 PM
Apologies for the long question. Basically I am trying to loop through a dictionary I've constructed and check whether a specific element of the hash is in a given list.
Test script:
Strangely, when I run this test script it works, but when I run the proper script it doesn't.
For the test script you get 2 printed out.
So I expect this to work in the same way, but it prints 0. This snapper_data file has 100k + lines.
The data looks like this:
The isolateFile is a text file:
L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1
The snapper_data file is csv file:
1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832
I'm really desperate to get command of using dictionaries but this is bugging me.
Test script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Hash_Isolates = { "1" : [ 'L02476-16_P_R1' , 'AE006468' , '873' ], "2" : [ 'AE006468' , 'AE006468' , '40' ], "3" : [ 'AE006468' , 'L02476-16_P_R1' , '756' ], "4" : [ 'L00409-17_R1' , 'L02476-16_P_R1' , '987' ], "5" : [ 'L00817-17_R1' , 'AE006468' , '65' ] } new_isolateList = [ 'AE006468' , 'L00817-17_R1' ] my_Isolates = [] for i in Hash_Isolates: if Hash_Isolates[i][ 0 ] in new_isolateList and Hash_Isolates[i][ 1 ] in new_isolateList: my_Isolates.append(Hash_Isolates[i]) print ( len (my_Isolates)) |
For the test script you get 2 printed out.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
#!/usr/bin/env python2.7 import getpass import sys import re isolateFile = sys.argv[ 1 ] snapper_data = sys.argv[ 2 ] ## Get the user ID ## def get_User(): currentUser = getpass.getuser() return currentUser isolatePath = '/home/' + get_User() + '/path/to/file/' + isolateFile dataPath = '/home/' + get_User() + '/path/to/file/' + snapper_data # Retrieve isolates from file isolateList = [] with open (isolatePath, 'r' ) as file : isolateList = file .readlines() new_isolateList = [] for i in isolateList: try : x = re.search( '(\w.....-?.?.?\d?)' , str (i)).group( 1 ) except : pass new_isolateList.append(x) all_results = [] with open (dataPath, 'r' ) as file : all_results = file .readlines() # w is the position in the list of the samples being compared from the whole file # x is first sample in comparison # y is the second sample in comparison # z is the SNP distance between the first and second samples Hash_Isolates = {} for i in all_results: w = re.search( '(.?.?.?.?.?.?.?),.+,.+,\d+\n' , str (i)).group( 1 ) x = re.search( '.?.?.?.?.?.?.?,(.+),.+,\d+\n' , str (i)).group( 1 ) y = re.search( '.?.?.?.?.?.?.?,.+,(.+),\d+\n' , str (i)).group( 1 ) z = re.search( '.?.?.?.?.?.?.?,.+,.+,(\d+)\n' , str (i)).group( 1 ) Hash_Isolates[w] = [x, y, z] my_Isolates = [] for i in Hash_Isolates: if Hash_Isolates[i][ 0 ] in new_isolateList and Hash_Isolates[i][ 1 ] in new_isolateList: my_Isolates.append(Hash_Isolates[i]) print ( len (my_Isolates)) |
The data looks like this:
The isolateFile is a text file:
L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1
The snapper_data file is csv file:
1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832
I'm really desperate to get command of using dictionaries but this is bugging me.