Module for procedural generation with hashes

PhilHibbs · Nov-03-2017, 11:16 AM

I am working on a Python library to replace use of the random number generator with a hash-based system, so that seeded sequences are no longer fragile when new calls to the RNG are inserted.

https://github.com/UKHomeOffice/python-pghash

It has basic features complete and no known bugs, I'm calling it version 0.0.1 for now in case I need to make any changes that break seed-compatibility. I am hoping that I will never need to do that.

PhilHibbs · (This post was last modified: Dec-13-2017, 03:12 PM by PhilHibbs.)

I have found a flaw that will break seed-compatiblity.

If the tuple contains tuples, the join will not be deep. If you use semicolon separators, (1,(2,3)) will be flattened to the string "1;(2, 3)" since the join does not descend into the second tuple. It just takes the default stringification which is comma-and-space separated. Also the parentheses are not controllable, so this would create a key clash with the tuple (1,"(2, 3)").

I am going to fix this by doing a recursive flatten using custom parentheses and separators so that (1,(2,3)) can map to "1;{2;3}" and (1,"(2, 3)") to "1;(2, 3)" avoiding the hash clash.

Suggestions are welcome as to how to implement a recursive stringification!
https://stackoverflow.com/questions/4779...o-a-string

PhilHibbs · (This post was last modified: Dec-13-2017, 04:30 PM by PhilHibbs.)

Ok, I've come up with a fix that in theory can maintain seed-compatibility.

The interface will change, instead of accepting a separator it will accept a lambda function that stringifies the tuple.

Compatibility with the existing interface can be maintained with this:

pghash.pghgen(seed=seed,joiner=lambda t: ','.join(map(str,t)))

The default lambda is just the str function. A custom lambda that uses curly braces and semicolons would look like this:

f1 = lambda t: '{' + ';'.join(map(str, t)) +'}' if hasattr(t, '__iter__') else str(t)
m1 = lambda t: ';'.join(map(f1, t))

I wonder if the rather ugly lambda syntax would be a barrier to some. Maybe it should just accept the separator and bracket characters and build the lambda itself? Maybe as an alternative interface, so that you can still specify your own stringification lambda?

pghash.pghgen(seed=seed,lamb='{;}')

PhilHibbs · (This post was last modified: Dec-14-2017, 09:13 AM by PhilHibbs.)

I think this would be better:

pghash.pghgen(seed=seed,seps=('{',';','}'))

Or is it bad form to wedge three related parameters into a tuple?

PhilHibbs · Dec-14-2017, 11:10 AM

Version 0.0.2 released

Instead of a separator, it accepts an optional parameter called "joiner" which can be either a lambda function, a tuple of three strings to use as separator and parentheses, or a three-character string of the same.

These two are equivalent:

pg=pghash.pghgen(seed=123,joiner='{:}')
pg=pghash.pghgen(seed=123,joiner=('{',':','}')

These are all equivalent:

pg=pghash.pghgen(seed=123)
pg=pghash.pghgen(seed=123,joiner=('(',', ',')')
pg=pghash.pghgen(seed=123,joiner=lambda t: str(t))

This is equivalent to the old default:

pg=pghash.pghgen(seed=123,joiner=lambda t: sep.join(map(str,t)))

I don't actually care about the old default any more, and no-one else has used this code yet to my knowledge, but I'm happy that I can maintain seed-compatibility. It's a good omen.

PhilHibbs · (This post was last modified: Dec-19-2017, 05:03 PM by PhilHibbs.)

Version 0.0.2:
https://github.com/UKHomeOffice/python-p....0.2-alpha

I'm thinking of removing the lambda. I'm not using it, and I can no longer see a need for it. I thought it was necessary because my initial stringification mechanism didn't automatically quote strings, so (1,'(2, 3)') would stringify to the same as (1,(2,3)).

Now that I am using the str() function as the default lambda, this is no longer the case - you get (1, '(2, 3)') and (1, (2, 3)) which are distinct.

PhilHibbs · (This post was last modified: Dec-20-2017, 09:55 AM by PhilHibbs.)

Version 0.0.3 will use the core md5 module instead of xxhash. Although xxhash seems to win in a direct performance test, in practice it makes very little difference, and using md5 actually seems to be faster in the real world. For me, it is not worth the price of having to install xxhash, which also needs a C++ compiler. That's a fight I don't want to have with chance management, to require a C++ compiler in order to run my Python code.

This, of course, will totally trash seed-compatibility. But that's what 0.0.x versions are for. Oh well.

I suppose I could take the hash function as a lambda, and then the application could import whichever hash library it wants. pghash would need to be sensitive to the length of the hash returned though, as xxhash returns 64 bits and md5 returns 128, and that's an extra test I don't want to have to do.

PhilHibbs · (This post was last modified: Jan-29-2018, 03:20 PM by PhilHibbs.)

Version 0.0.3 is out: https://github.com/UKHomeOffice/python-p....0.3-alpha

This removes the requirement for the xxhash module, and uses the in-built md5 hash instead. This also enables seedless hashing so the seed in the main pghash class is now redundant. You can still specify a seed to the pghgen wrapper, and this will be tupled up with every call to the pghash class that uses the wrapper.

Also I fixed a few minor issues with the README and init file documentation.

A side benefit of md5 is that it returns higher resolution hashes, so gaussian distributions will be smoother and have a greater range.

PhilHibbs · (This post was last modified: Feb-28-2018, 02:07 PM by PhilHibbs.)

I've added some test scripts. Nothing very formal or automated.

Could someone please help me out with packaging it? I'm trying to figure out what I need to to upload it to pypi or wherever the current best repository for packages is.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Cute oscillating range generation snippet I saw on irc	league55	1	2,758	Mar-26-2018, 04:19 PM Last Post: nilamo

Module for procedural generation with hashes

User Panel Messages

Announcements