Python Forum
i woule a way to parse a line python source like split
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
i woule a way to parse a line python source like split
#1
i would like to parse a line of python source instead of merely split it:
    line = "blank=' '"
    parts = line.split()
    parts -> ["blank='","'"]
    pieces = ??parse??(line)
    pieces -> ["blank"."=","' '"]
i want to get what pieces ends up with.
"a[0]=int(ab[16])+2" -> ["a","[","0","]","=","int","(","ab","[","16","]",")","+","2"]
i need to parse some python source only one line at a time to make some edits by other scripts, although i am not sure how best to handle continuations. comments should be one big string. if white spaces are always included, that's ok.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
You can use the tokenize module to get a list of tokens from python code:
>>> f = io.BytesIO(b"a[0]=int(ab[16])+2")
>>> tokens = list(tokenize.tokenize(f.readline))
>>> pp(tokens)
[
    tokenize.TokenInfo(
        type=57,
        string='utf-8',
        start=(0, 0),
        end=(0, 0),
        line=''
    ),
    tokenize.TokenInfo(
        type=1,
        string='a',
        start=(1, 0),
        end=(1, 1),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string='[',
        start=(1, 1),
        end=(1, 2),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=2,
        string='0',
        start=(1, 2),
        end=(1, 3),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string=']',
        start=(1, 3),
        end=(1, 4),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string='=',
        start=(1, 4),
        end=(1, 5),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=1,
        string='int',
        start=(1, 5),
        end=(1, 8),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string='(',
        start=(1, 8),
        end=(1, 9),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=1,
        string='ab',
        start=(1, 9),
        end=(1, 11),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string='[',
        start=(1, 11),
        end=(1, 12),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=2,
        string='16',
        start=(1, 12),
        end=(1, 14),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string=']',
        start=(1, 14),
        end=(1, 15),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string=')',
        start=(1, 15),
        end=(1, 16),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string='+',
        start=(1, 16),
        end=(1, 17),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=2,
        string='2',
        start=(1, 17),
        end=(1, 18),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=0,
        string='',
        start=(2, 0),
        end=(2, 0),
        line=''
    )
]
>>> [t.string for t in tokens]
['utf-8', 'a', '[', '0', ']', '=', 'int', '(', 'ab', '[', '16', ']', ')', '+', '2', '']
However, if you're looking to modify the code, using ast with a custom NodeTransformer might be simpler. ast will get rid of comments though, so that might be a problem.
Reply
#3
yeah, i am looking to modify code. but i am also looking to test if lines of code have a matching pattern. if so, the line will be subject to modification. if not, the line will be printed in whole. some modifications may need to wait until some later lines to determine what the modification is. some lines could be deleted.

the most difficult part is that things i will be looking for could be in quoted string literals. something that breaks up code from quoted string literal contents and comments would be useful, i think.

it is possible that a comment line or a comment at the end of a line could match the pattern as a false-positive. a # could be in a string literal giving a false appearance of a comment. comments could be a tough issue though not as tough for Python as C/Pike code was when i have needed to modify it, due to embedded comments.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to parse and group hierarchical list items from an unindented string in Python? ann23fr 0 179 Mar-27-2024, 01:16 PM
Last Post: ann23fr
  Algorithm for extracting comments from Python source code Pavel1982 6 514 Feb-28-2024, 09:52 PM
Last Post: Pavel1982
  [split] Parse Nested JSON String in Python mmm07 4 1,521 Mar-28-2023, 06:07 PM
Last Post: snippsat
  python move specific files from source to destination including duplicates mg24 3 1,096 Jan-21-2023, 04:21 AM
Last Post: deanhystad
  Python Snippets Source kucingkembar 0 630 Oct-18-2022, 12:50 AM
Last Post: kucingkembar
  python read iperf log and parse throughput jacklee26 4 2,757 Aug-27-2022, 07:04 AM
Last Post: Yoriz
  How to parse a live feed in Python? Daring_T 2 4,094 Jan-20-2022, 04:17 AM
Last Post: Daring_T
  multi-line CMD in one-line python kucingkembar 5 3,956 Jan-01-2022, 12:45 PM
Last Post: kucingkembar
  Long-term stable source to get news headlines with Python? sandufi 4 1,915 Dec-23-2021, 09:48 AM
Last Post: sandufi
  Navigating cpython source using python quazirfan 3 2,022 Aug-09-2021, 07:52 PM
Last Post: quazirfan

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020