Nov-28-2024, 04:18 PM
Hello everyone,
I have noticed a very strange behavior with the hashlib library in connection with md5 hashes -> Python 3.11.9
URLs are transferred via my application / API. Behind the URLs are assets (video, images or PDFs) that I have to download.
The md5 hash of the asset byte object is used to determine whether this is a new or existing asset. If this md5 hash already exists in the
DB, a reference is created for the webshop. Otherwise the asset is transferred to the shop as new. This is the simplified version of the project.
Up to 10000 URLs can be transferred per API request.
Now I have noticed during the tests that assets are sometimes transferred as new, although they should already exist.
Here is a simplified code snippet with which I was able to narrow down the error
Do you see a possibility for a quick workaround?
Many thanks for your help
I have noticed a very strange behavior with the hashlib library in connection with md5 hashes -> Python 3.11.9
URLs are transferred via my application / API. Behind the URLs are assets (video, images or PDFs) that I have to download.
The md5 hash of the asset byte object is used to determine whether this is a new or existing asset. If this md5 hash already exists in the
DB, a reference is created for the webshop. Otherwise the asset is transferred to the shop as new. This is the simplified version of the project.
Up to 10000 URLs can be transferred per API request.
Now I have noticed during the tests that assets are sometimes transferred as new, although they should already exist.
Here is a simplified code snippet with which I was able to narrow down the error
def get_md5_hash(url): asset_md5_hash = hashlib.md5() r = requests.get(url, allow_redirects=True, timeout=5, stream=True) try: r.raise_for_status() except Exception as error: print(f"an error occurred, error desc: '{error}'") else: r.raw.decode_content = True try: for line in r.iter_content(chunk_size=1024): if line: asset_md5_hash.update(line) except Exception as error: print(f"could not read bytes object, {error}") else: print(f"md5 hash: {asset_md5_hash.hexdigest()}")If I run this function 100 times in a for loop, I get the hash “8c702e1eda4d55f4b11d1eabf7738a0e” 98 times and “46651ab690a01143cbb5279eabf0909a” 2 times.
md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e […] md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 46651ab690a01143cbb5279eabf0909a md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 46651ab690a01143cbb5279eabf0909a md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e md5 hash: 8c702e1eda4d55f4b11d1eabf7738a0e […]How can this be?
Do you see a possibility for a quick workaround?
Many thanks for your help