Jan-05-2018, 07:41 PM
(This post was last modified: Jan-05-2018, 07:42 PM by Gribouillis.)
If you are using Python, the simplest storage until you have found the duplicates is a pickle file containing a list of tuples (or namedtuples). Once the duplicates are found, you can write a new Excel file.
The reason for this is that Python is slow when it reads Excel files, while loading such a pickle file with 30000 records takes a fraction of a second.
That said, without more information about the contents of the data, it is difficult to elaborate a good strategy.
The reason for this is that Python is slow when it reads Excel files, while loading such a pickle file with 30000 records takes a fraction of a second.
That said, without more information about the contents of the data, it is difficult to elaborate a good strategy.