Nov-11-2018, 06:38 AM
You can use the tokenize module to get a list of tokens from python code:
>>> f = io.BytesIO(b"a[0]=int(ab[16])+2") >>> tokens = list(tokenize.tokenize(f.readline)) >>> pp(tokens) [ tokenize.TokenInfo( type=57, string='utf-8', start=(0, 0), end=(0, 0), line='' ), tokenize.TokenInfo( type=1, string='a', start=(1, 0), end=(1, 1), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=53, string='[', start=(1, 1), end=(1, 2), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=2, string='0', start=(1, 2), end=(1, 3), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=53, string=']', start=(1, 3), end=(1, 4), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=53, string='=', start=(1, 4), end=(1, 5), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=1, string='int', start=(1, 5), end=(1, 8), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=53, string='(', start=(1, 8), end=(1, 9), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=1, string='ab', start=(1, 9), end=(1, 11), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=53, string='[', start=(1, 11), end=(1, 12), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=2, string='16', start=(1, 12), end=(1, 14), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=53, string=']', start=(1, 14), end=(1, 15), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=53, string=')', start=(1, 15), end=(1, 16), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=53, string='+', start=(1, 16), end=(1, 17), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=2, string='2', start=(1, 17), end=(1, 18), line='a[0]=int(ab[16])+2' ), tokenize.TokenInfo( type=0, string='', start=(2, 0), end=(2, 0), line='' ) ] >>> [t.string for t in tokens] ['utf-8', 'a', '[', '0', ']', '=', 'int', '(', 'ab', '[', '16', ']', ')', '+', '2', '']However, if you're looking to modify the code, using ast with a custom
NodeTransformer
might be simpler. ast
will get rid of comments though, so that might be a problem.