Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
hashing a function
#1
Hey,
So I'm working on a sort of distributed build system. The system allows execution of snippets of scripts as build steps. I need to be able to hash these snippets of code in a way that comments and doc strings don't affect the hash. I'm getting part way there by using the ast module to parse the code, then doing an ast.dump and hashing the resulting string. The natural next step for me is to clear the first Expr(str()) node in the body of every FunctionDef node.

Is there a better way to solve this problem? I can't help think that this is a problem that must have been solved many times already, but I cant find anything on stack overflow.

Thanks again for any help,
Luke

Hey so this is the code I have at the moment for hashing a function.

import ast
import hashlib
import inspect

def _remove_docstring(node):
    '''
    Removes all the doc strings in a FunctionDef or ClassDef as node.
    Arguments:
        node (ast.FunctionDef or ast.ClassDef): The node whose docstrings to
            remove.
    '''
    if not (isinstance(node, ast.FunctionDef) or
            isinstance(node, ast.ClassDef)):
        return

    if len(node.body) != 0:
        docstr = node.body[0]
        if isinstance(docstr, ast.Expr) and isinstance(docstr.value, ast.Str):
            node.body.pop(0)

#-------------------------------------------------------------------------------
def hash_function(func):
    '''
    Produces a hash for the code in the given function.
    Arguments:
        func (types.FunctionObject): The function to produce a hash for
    '''
    func_str = inspect.getsource(func)
    module = ast.parse(func_str)

    assert len(module.body) == 1 and isinstance(module.body[0], ast.FunctionDef)

    # Clear function name so it doesn't affect the hash
    func_node = module.body[0]
    func_node.name = "" 

    # Clear all the doc strings
    for node in ast.walk(module):
        _remove_docstring(node)

    # Convert the ast to a string for hashing
    ast_str = ast.dump(module, annotate_fields=False)

    # Produce the hash
    fhash = hashlib.sha256(ast_str)
    result = fhash.hexdigest()
    return result

#-------------------------------------------------------------------------------
# Function 1
def test(blah):
    'This is a test'
    class Test(object):
        '''
        My test class
        '''
    print blah
    def sub_function(foo):
        '''arg'''
print hash_function(test)

#-------------------------------------------------------------------------------
# Function 2
def test2(blah):
    'This is a test'
    class Test(object):
        '''
        My test class
        '''
    print blah
    def sub_function(foo):
        '''arg meh'''
print hash_function(test2)
Reply
#2
Why not just hash function.__code__?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Hashing an address for binary file Python_help 8 2,644 Nov-04-2021, 06:23 AM
Last Post: ndc85430
  MD5 Hashing Evil_Patrick 1 2,063 Sep-21-2019, 07:47 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020