Jun-08-2024, 12:11 PM
This compare routine works fine to constrain the search to specific rows and columns. However, the column limits are useless if I want to compare 2 TSV ("\t") files (as columns are no longer reliably aligned). I did try reading in the files as TSVs but difference.compare does not like this at all and returns an unhashable list error. Can somebody advise what I need to do to convert my compare routine to work with CSV type files (limited to specific elements in the list, to provide similar functionality to the column parameters). Please assume you are dealing with a newb....
start_rec = 0 stop_rec = 999999 start_col = 0 end_col = 999999 ############### END OF USER PARAMETERS #################### encode = "utf-8" cnt1=0 cnt2=0 with open(val_input_set1, "r",encoding=encode) as f1,\ open(val_input_set2, "r",encoding=encode) as f2,\ open(val_output_set, "w", newline='') as f3: r1 = list(itertools.islice(f1, start_rec, stop_rec)) r2 = list(itertools.islice(f2, start_rec, stop_rec)) r1a=[xxx[start_col:end_col] for xxx in r1[start_rec:stop_rec]] r2a=[xxx[start_col:end_col] for xxx in r2[start_rec:stop_rec]] for x in difference.compare(r1a, r2a):
with open(val_input_set1, "r",encoding=encode) as f1,\ open(val_input_set2, "r",encoding=encode) as f2,\ open(val_output_set, "w", newline='') as f3: tsv_input_set1 = csv.reader(f1, delimiter="\t") tsv_input_set2 = csv.reader(f2, delimiter='\t') # r1 = list(itertools.islice(tsv_input_set1, start_rec, stop_rec)) # r2 = list(itertools.islice(tsv_input_set2, start_rec, stop_rec)) r1 = tsv_input_set1 r2 = tsv_input_set2 for x in difference.compare(r1, r2):
Error:Traceback (most recent call last):
File "c:\Users\gonks\Desktop\python_work\compare_list.py", line 53, in <module>
for x in difference.compare(r1, r2):
File "C:\Program Files\Python312\Lib\difflib.py", line 859, in compare
cruncher = SequenceMatcher(self.linejunk, a, b)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\difflib.py", line 182, in __init__
self.set_seqs(a, b)
File "C:\Program Files\Python312\Lib\difflib.py", line 194, in set_seqs
self.set_seq2(b)
File "C:\Program Files\Python312\Lib\difflib.py", line 248, in set_seq2
self.__chain_b()
File "C:\Program Files\Python312\Lib\difflib.py", line 281, in __chain_b
indices = b2j.setdefault(elt, [])
^^^^^^^^^^^^^^^^^^^^^^^
TypeError: unhashable type: 'list'