Python Forum

Full Version: fast hash function
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
hi guys,
i'm looking for a faster-than-the-default-adler32/md5/etc non-crypto hash function.
i have googled and found things like pyfasthash, xxhash, etc. but they only appear to be available in [C] source form. not being a C developer, and not wanting to acquire the baggage that that entails, i would much prefer something that i could pip install.
any ideas?
Use %timeit to compare them. here is an example.

pip install pyhashxx
some additional info on various algs: https://cyan4973.github.io/xxHash/
sample script
# in ipython
from pyhashxx import hashxx
import hashlib

def hashxx_test(x):
    result = hashxx(x)

def hashlib_md5_test(x):
    m = hashlib.md5()
    m.update(x)

test_str_one = "Hello World!"
test_str_two = "ABCD1234"*2**17 #1MB
test_str_three = "ABCD1234"*2**23 # 64MB

# 12 Bytes
%timeit hashlib_md5_test(test_str_one)
# 1670 ns
%timeit hashxx_test(test_str_one)
# 359 ns (4.7x speedup)

1MB
%timeit hashlib_md5_test(test_str_two)
# 3840 us
%timeit hashxx_test(test_str_two)
# 417 us (9.2x speedup)

# 64MB
%timeit hashlib_md5_test(test_str_two)
# 225 ms
%timeit hashxx_test(test_str_two)
# 29.6 ms (7.6x speedup)
Many (most, all) hash functions involve a lot of bit shifting, and masking that is why most are written in C
Any book on compiler design or Algorithms will include many examples, but usually in 'C' or smalltalk
thanks, guys.