Python Forum

Full Version: parse tree vs just tokenizing
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i do not need a parse tree result. i only want to get a list of tokens from a string of Python source code. that means breaknig up a string into the parts that make sense for working with the code. i believe this is the lexical phase of a compilation. my code will only be doing simpler things like replacing things in a certaing context such as when other tokens are present or absent. a typical tokenization might look like:
if this[x]==that[y]: where.at('this',x,y)
-> ['if','this','[','x',']','==','that','[','y',']',':','where.at','(',"'this'",',','x',',','y',')',']'
it does not matter if blank spaces are included or not. it does not matter if it splits around a dot or not.
Simply use tokenize.tokenize()
does that tokenize according to the Python language syntax?
i originally thought of tokenizing in a different way, more like what command would need. i wrote some code to do that back in my C days and also way back in my assembler days. so i was thinking in those terms. i haven't needed anything like that, yet, so i had no reason to look at the tokenize module. i knew it existed but hadn't had a reason to read up on it.

well, now i have, and not only is it a solution to my immediate need, but it looks like something i can do many things with, including a Python oriented editor where i can have string substitution be applied to specific language parts. for example, change "foo" to "bar" in names, not in string literals.

i'm playing around, tonight.
now if i could find code in Python to do this for the C language.
Skaperen Wrote:now if i could find code in Python to do this for the C language.

There is a well known Python module named pycparser for parsing the C language. It seems that it contains internally a Python module for lexical analysis of C code, written usin the PLY lexer/parser. You could have a look in this direction.