Apr-13-2019, 05:20 PM
Thanks for the tip. However, the compression and sha256 overhead is essential functionality and can't be avoided.
My concern was that hashing had already been done once, and using the dict adds two or three more hashing ops on top of that. My understanding of the native Python hashing is that unfortunately it's not fit for strong verification and deduplication, so I don't think I can use the Python dict hashes for those program functions.
I will do some testing to see whether the extra hashing imposed by dict incurs a significant penalty. But consider the range of range of iterations involved: For a 2TB volume, processing with a chunk size of 64KB will result in tens of millions of iterations.
My concern was that hashing had already been done once, and using the dict adds two or three more hashing ops on top of that. My understanding of the native Python hashing is that unfortunately it's not fit for strong verification and deduplication, so I don't think I can use the Python dict hashes for those program functions.
I will do some testing to see whether the extra hashing imposed by dict incurs a significant penalty. But consider the range of range of iterations involved: For a 2TB volume, processing with a chunk size of 64KB will result in tens of millions of iterations.