Jan-05-2018, 09:03 PM
(Jan-05-2018, 07:41 PM)Gribouillis Wrote: If you are using Python, the simplest storage until you have found the duplicates is a pickle file containing a list of tuples (or namedtuples). Once the duplicates are found, you can write a new Excel file.
The reason for this is that Python is slow when it reads Excel files, while loading such a pickle file with 30000 records takes a fraction of a second.
That said, without more information about the contents of the data, it is difficult to elaborate a good strategy.
Oh wow, ok, I'll look into that. I'm not familiar with pickle files, but I'm sure a quick Google will tell me everything I need to know. Thanks!