Python Forum

Full Version: How to compare specific elements of a TSV in difflib
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
This compare routine works fine to constrain the search to specific rows and columns. However, the column limits are useless if I want to compare 2 TSV ("\t") files (as columns are no longer reliably aligned). I did try reading in the files as TSVs but difference.compare does not like this at all and returns an unhashable list error. Can somebody advise what I need to do to convert my compare routine to work with CSV type files (limited to specific elements in the list, to provide similar functionality to the column parameters). Please assume you are dealing with a newb....

start_rec = 0
stop_rec  = 999999 
start_col = 0
end_col   = 999999 
############### END OF USER PARAMETERS ####################
encode = "utf-8"
cnt1=0
cnt2=0

with open(val_input_set1, "r",encoding=encode) as f1,\
     open(val_input_set2, "r",encoding=encode) as f2,\
     open(val_output_set, "w", newline='') as f3:
    
    r1 = list(itertools.islice(f1, start_rec, stop_rec))
    r2 = list(itertools.islice(f2, start_rec, stop_rec))

    r1a=[xxx[start_col:end_col] for xxx in r1[start_rec:stop_rec]]  
    r2a=[xxx[start_col:end_col] for xxx in r2[start_rec:stop_rec]]  

    for x in difference.compare(r1a, r2a):
with open(val_input_set1, "r",encoding=encode) as f1,\
     open(val_input_set2, "r",encoding=encode) as f2,\
     open(val_output_set, "w", newline='') as f3:

     tsv_input_set1 = csv.reader(f1, delimiter="\t")
     tsv_input_set2 = csv.reader(f2, delimiter='\t')

#     r1 = list(itertools.islice(tsv_input_set1, start_rec, stop_rec))
#     r2 = list(itertools.islice(tsv_input_set2, start_rec, stop_rec))

     r1 = tsv_input_set1 
     r2 = tsv_input_set2 

     for x in difference.compare(r1, r2):
Error:
Traceback (most recent call last): File "c:\Users\gonks\Desktop\python_work\compare_list.py", line 53, in <module> for x in difference.compare(r1, r2): File "C:\Program Files\Python312\Lib\difflib.py", line 859, in compare cruncher = SequenceMatcher(self.linejunk, a, b) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python312\Lib\difflib.py", line 182, in __init__ self.set_seqs(a, b) File "C:\Program Files\Python312\Lib\difflib.py", line 194, in set_seqs self.set_seq2(b) File "C:\Program Files\Python312\Lib\difflib.py", line 248, in set_seq2 self.__chain_b() File "C:\Program Files\Python312\Lib\difflib.py", line 281, in __chain_b indices = b2j.setdefault(elt, []) ^^^^^^^^^^^^^^^^^^^^^^^ TypeError: unhashable type: 'list'
You could try this perhaps, to replace lists by tuples in the input sets
tsv_input_set1 = map(tuple, csv.reader(f1, delimiter="\t"))
tsv_input_set2 = map(tuple, csv.reader(f2, delimiter='\t'))