Oct-16-2016, 06:33 AM
if you know C (I expect the hash algorithm is written in C) you might want to take a look at the two algorithms.
Most that I have written or borrowed (I used a modified of Aho's from the dragon book) were actually quite simple,
usually fed a seed that was the size of the hash table, manipulating the key through an iterative process of masks and
bit shifts. only a few lines of code.
What you did with it afterwards is where it can get more complicated (although, with care, this can be simple as well). The one that I used for processing
a days worth of phone calls (~80 million calls) used a lateral extension, which was actually a linked list, when a collision was encountered. By using the
size of the table as part of the has, the distribution was very even. The linked list handling of collisions had the (very good) side effect of not running
out of space.
This algorithm could process (identify customer, distance between points, number of points, segment rating, etc.) in twenty minutes.
The lateral lists never got too long, so caused little delay.
I got into hashing in a big way, saving a few computer cycles on a single call really added up when you were processing so many.
Should you get interested and investigate the python hashes, I'd be interested in what you find.
Larz60+
Most that I have written or borrowed (I used a modified of Aho's from the dragon book) were actually quite simple,
usually fed a seed that was the size of the hash table, manipulating the key through an iterative process of masks and
bit shifts. only a few lines of code.
What you did with it afterwards is where it can get more complicated (although, with care, this can be simple as well). The one that I used for processing
a days worth of phone calls (~80 million calls) used a lateral extension, which was actually a linked list, when a collision was encountered. By using the
size of the table as part of the has, the distribution was very even. The linked list handling of collisions had the (very good) side effect of not running
out of space.
This algorithm could process (identify customer, distance between points, number of points, segment rating, etc.) in twenty minutes.
The lateral lists never got too long, so caused little delay.
I got into hashing in a big way, saving a few computer cycles on a single call really added up when you were processing so many.
Should you get interested and investigate the python hashes, I'd be interested in what you find.
Larz60+