Python Forum
How to compare specific elements of a TSV in difflib
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to compare specific elements of a TSV in difflib
#1
This compare routine works fine to constrain the search to specific rows and columns. However, the column limits are useless if I want to compare 2 TSV ("\t") files (as columns are no longer reliably aligned). I did try reading in the files as TSVs but difference.compare does not like this at all and returns an unhashable list error. Can somebody advise what I need to do to convert my compare routine to work with CSV type files (limited to specific elements in the list, to provide similar functionality to the column parameters). Please assume you are dealing with a newb....

start_rec = 0
stop_rec  = 999999 
start_col = 0
end_col   = 999999 
############### END OF USER PARAMETERS ####################
encode = "utf-8"
cnt1=0
cnt2=0

with open(val_input_set1, "r",encoding=encode) as f1,\
     open(val_input_set2, "r",encoding=encode) as f2,\
     open(val_output_set, "w", newline='') as f3:
    
    r1 = list(itertools.islice(f1, start_rec, stop_rec))
    r2 = list(itertools.islice(f2, start_rec, stop_rec))

    r1a=[xxx[start_col:end_col] for xxx in r1[start_rec:stop_rec]]  
    r2a=[xxx[start_col:end_col] for xxx in r2[start_rec:stop_rec]]  

    for x in difference.compare(r1a, r2a):
with open(val_input_set1, "r",encoding=encode) as f1,\
     open(val_input_set2, "r",encoding=encode) as f2,\
     open(val_output_set, "w", newline='') as f3:

     tsv_input_set1 = csv.reader(f1, delimiter="\t")
     tsv_input_set2 = csv.reader(f2, delimiter='\t')

#     r1 = list(itertools.islice(tsv_input_set1, start_rec, stop_rec))
#     r2 = list(itertools.islice(tsv_input_set2, start_rec, stop_rec))

     r1 = tsv_input_set1 
     r2 = tsv_input_set2 

     for x in difference.compare(r1, r2):
Error:
Traceback (most recent call last): File "c:\Users\gonks\Desktop\python_work\compare_list.py", line 53, in <module> for x in difference.compare(r1, r2): File "C:\Program Files\Python312\Lib\difflib.py", line 859, in compare cruncher = SequenceMatcher(self.linejunk, a, b) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python312\Lib\difflib.py", line 182, in __init__ self.set_seqs(a, b) File "C:\Program Files\Python312\Lib\difflib.py", line 194, in set_seqs self.set_seq2(b) File "C:\Program Files\Python312\Lib\difflib.py", line 248, in set_seq2 self.__chain_b() File "C:\Program Files\Python312\Lib\difflib.py", line 281, in __chain_b indices = b2j.setdefault(elt, []) ^^^^^^^^^^^^^^^^^^^^^^^ TypeError: unhashable type: 'list'
Reply
#2
You could try this perhaps, to replace lists by tuples in the input sets
tsv_input_set1 = map(tuple, csv.reader(f1, delimiter="\t"))
tsv_input_set2 = map(tuple, csv.reader(f2, delimiter='\t'))
« We can solve any problem by introducing an extra level of indirection »
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Cant use difflib with islice? gonksoup 2 1,172 Jan-22-2024, 01:07 PM
Last Post: deanhystad
  ValueError: Length mismatch: Expected axis has 8 elements, new values have 1 elements ilknurg 1 8,155 May-17-2022, 11:38 AM
Last Post: Larz60+
  Sorting Elements via parameters pointing to those elements. rpalmer 3 3,464 Feb-10-2021, 04:53 PM
Last Post: rpalmer
  Remove specific elements from list with a pattern Xalagy 3 3,614 Oct-11-2020, 07:18 AM
Last Post: Xalagy
  adding elements to a list that are more than a specific number Olavv 2 2,966 Mar-19-2020, 06:05 PM
Last Post: Olavv
  Delete specific lines contain specific words mannyi 2 5,053 Nov-04-2019, 04:50 PM
Last Post: mannyi
  [difflib] read files with SequenceMatcher JamieVanCadsand 3 5,854 Sep-15-2017, 09:15 AM
Last Post: JamieVanCadsand
  get only additions in difflib.unified_diff metulburr 3 7,207 May-08-2017, 08:12 AM
Last Post: volcano63

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020