Python Forum
parse tree vs just tokenizing
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
parse tree vs just tokenizing
#1
i do not need a parse tree result. i only want to get a list of tokens from a string of Python source code. that means breaknig up a string into the parts that make sense for working with the code. i believe this is the lexical phase of a compilation. my code will only be doing simpler things like replacing things in a certaing context such as when other tokens are present or absent. a typical tokenization might look like:
if this[x]==that[y]: where.at('this',x,y)
-> ['if','this','[','x',']','==','that','[','y',']',':','where.at','(',"'this'",',','x',',','y',')',']'
it does not matter if blank spaces are included or not. it does not matter if it splits around a dot or not.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Simply use tokenize.tokenize()
Reply
#3
does that tokenize according to the Python language syntax?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
i originally thought of tokenizing in a different way, more like what command would need. i wrote some code to do that back in my C days and also way back in my assembler days. so i was thinking in those terms. i haven't needed anything like that, yet, so i had no reason to look at the tokenize module. i knew it existed but hadn't had a reason to read up on it.

well, now i have, and not only is it a solution to my immediate need, but it looks like something i can do many things with, including a Python oriented editor where i can have string substitution be applied to specific language parts. for example, change "foo" to "bar" in names, not in string literals.

i'm playing around, tonight.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
now if i could find code in Python to do this for the C language.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
Skaperen Wrote:now if i could find code in Python to do this for the C language.

There is a well known Python module named pycparser for parsing the C language. It seems that it contains internally a Python module for lexical analysis of C code, written usin the PLY lexer/parser. You could have a look in this direction.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020