Hey,
So I'm working on a sort of distributed build system. The system allows execution of snippets of scripts as build steps. I need to be able to hash these snippets of code in a way that comments and doc strings don't affect the hash. I'm getting part way there by using the ast module to parse the code, then doing an ast.dump and hashing the resulting string. The natural next step for me is to clear the first Expr(str()) node in the body of every FunctionDef node.
Is there a better way to solve this problem? I can't help think that this is a problem that must have been solved many times already, but I cant find anything on stack overflow.
Thanks again for any help,
Luke
Hey so this is the code I have at the moment for hashing a function.
So I'm working on a sort of distributed build system. The system allows execution of snippets of scripts as build steps. I need to be able to hash these snippets of code in a way that comments and doc strings don't affect the hash. I'm getting part way there by using the ast module to parse the code, then doing an ast.dump and hashing the resulting string. The natural next step for me is to clear the first Expr(str()) node in the body of every FunctionDef node.
Is there a better way to solve this problem? I can't help think that this is a problem that must have been solved many times already, but I cant find anything on stack overflow.
Thanks again for any help,
Luke
Hey so this is the code I have at the moment for hashing a function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
import ast import hashlib import inspect def _remove_docstring(node): ''' Removes all the doc strings in a FunctionDef or ClassDef as node. Arguments: node (ast.FunctionDef or ast.ClassDef): The node whose docstrings to remove. ''' if not ( isinstance (node, ast.FunctionDef) or isinstance (node, ast.ClassDef)): return if len (node.body) ! = 0 : docstr = node.body[ 0 ] if isinstance (docstr, ast.Expr) and isinstance (docstr.value, ast. Str ): node.body.pop( 0 ) #------------------------------------------------------------------------------- def hash_function(func): ''' Produces a hash for the code in the given function. Arguments: func (types.FunctionObject): The function to produce a hash for ''' func_str = inspect.getsource(func) module = ast.parse(func_str) assert len (module.body) = = 1 and isinstance (module.body[ 0 ], ast.FunctionDef) # Clear function name so it doesn't affect the hash func_node = module.body[ 0 ] func_node.name = "" # Clear all the doc strings for node in ast.walk(module): _remove_docstring(node) # Convert the ast to a string for hashing ast_str = ast.dump(module, annotate_fields = False ) # Produce the hash fhash = hashlib.sha256(ast_str) result = fhash.hexdigest() return result #------------------------------------------------------------------------------- # Function 1 def test(blah): 'This is a test' class Test( object ): ''' My test class ''' print blah def sub_function(foo): '''arg''' print hash_function(test) #------------------------------------------------------------------------------- # Function 2 def test2(blah): 'This is a test' class Test( object ): ''' My test class ''' print blah def sub_function(foo): '''arg meh''' print hash_function(test2) |