Python Forum
parse tree vs just tokenizing - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: General (https://python-forum.io/forum-1.html)
+--- Forum: News and Discussions (https://python-forum.io/forum-31.html)
+--- Thread: parse tree vs just tokenizing (/thread-34160.html)



parse tree vs just tokenizing - Skaperen - Jul-01-2021

i do not need a parse tree result. i only want to get a list of tokens from a string of Python source code. that means breaknig up a string into the parts that make sense for working with the code. i believe this is the lexical phase of a compilation. my code will only be doing simpler things like replacing things in a certaing context such as when other tokens are present or absent. a typical tokenization might look like:
if this[x]==that[y]: where.at('this',x,y)
-> ['if','this','[','x',']','==','that','[','y',']',':','where.at','(',"'this'",',','x',',','y',')',']'
it does not matter if blank spaces are included or not. it does not matter if it splits around a dot or not.


RE: parse tree vs just tokenizing - Gribouillis - Jul-02-2021

Simply use tokenize.tokenize()


RE: parse tree vs just tokenizing - Skaperen - Jul-03-2021

does that tokenize according to the Python language syntax?


RE: parse tree vs just tokenizing - Skaperen - Jul-04-2021

i originally thought of tokenizing in a different way, more like what command would need. i wrote some code to do that back in my C days and also way back in my assembler days. so i was thinking in those terms. i haven't needed anything like that, yet, so i had no reason to look at the tokenize module. i knew it existed but hadn't had a reason to read up on it.

well, now i have, and not only is it a solution to my immediate need, but it looks like something i can do many things with, including a Python oriented editor where i can have string substitution be applied to specific language parts. for example, change "foo" to "bar" in names, not in string literals.

i'm playing around, tonight.


RE: parse tree vs just tokenizing - Skaperen - Jul-04-2021

now if i could find code in Python to do this for the C language.


RE: parse tree vs just tokenizing - Gribouillis - Jul-04-2021

Skaperen Wrote:now if i could find code in Python to do this for the C language.

There is a well known Python module named pycparser for parsing the C language. It seems that it contains internally a Python module for lexical analysis of C code, written usin the PLY lexer/parser. You could have a look in this direction.