i woule a way to parse a line python source like split

Skaperen · Nov-11-2018, 03:59 AM

i would like to parse a line of python source instead of merely split it:

    line = "blank=' '"
    parts = line.split()
    parts -> ["blank='","'"]
    pieces = ??parse??(line)
    pieces -> ["blank"."=","' '"]

i want to get what pieces ends up with.

"a[0]=int(ab[16])+2" -> ["a","[","0","]","=","int","(","ab","[","16","]",")","+","2"]

i need to parse some python source only one line at a time to make some edits by other scripts, although i am not sure how best to handle continuations. comments should be one big string. if white spaces are always included, that's ok.

***stranac*** · Nov-11-2018, 06:38 AM

You can use the tokenize module to get a list of tokens from python code:

>>> f = io.BytesIO(b"a[0]=int(ab[16])+2")
>>> tokens = list(tokenize.tokenize(f.readline))
>>> pp(tokens)
[
    tokenize.TokenInfo(
        type=57,
        string='utf-8',
        start=(0, 0),
        end=(0, 0),
        line=''
    ),
    tokenize.TokenInfo(
        type=1,
        string='a',
        start=(1, 0),
        end=(1, 1),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string='[',
        start=(1, 1),
        end=(1, 2),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=2,
        string='0',
        start=(1, 2),
        end=(1, 3),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string=']',
        start=(1, 3),
        end=(1, 4),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string='=',
        start=(1, 4),
        end=(1, 5),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=1,
        string='int',
        start=(1, 5),
        end=(1, 8),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string='(',
        start=(1, 8),
        end=(1, 9),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=1,
        string='ab',
        start=(1, 9),
        end=(1, 11),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string='[',
        start=(1, 11),
        end=(1, 12),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=2,
        string='16',
        start=(1, 12),
        end=(1, 14),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string=']',
        start=(1, 14),
        end=(1, 15),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string=')',
        start=(1, 15),
        end=(1, 16),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=53,
        string='+',
        start=(1, 16),
        end=(1, 17),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=2,
        string='2',
        start=(1, 17),
        end=(1, 18),
        line='a[0]=int(ab[16])+2'
    ),
    tokenize.TokenInfo(
        type=0,
        string='',
        start=(2, 0),
        end=(2, 0),
        line=''
    )
]
>>> [t.string for t in tokens]
['utf-8', 'a', '[', '0', ']', '=', 'int', '(', 'ab', '[', '16', ']', ')', '+', '2', '']

However, if you're looking to modify the code, using ast with a custom NodeTransformer might be simpler. ast will get rid of comments though, so that might be a problem.

Skaperen · Nov-11-2018, 07:35 PM

yeah, i am looking to modify code. but i am also looking to test if lines of code have a matching pattern. if so, the line will be subject to modification. if not, the line will be printed in whole. some modifications may need to wait until some later lines to determine what the modification is. some lines could be deleted.

the most difficult part is that things i will be looking for could be in quoted string literals. something that breaks up code from quoted string literal contents and comments would be useful, i think.

it is possible that a comment line or a comment at the end of a line could match the pattern as a false-positive. a # could be in a string literal giving a false appearance of a comment. comments could be a tough issue though not as tough for Python as C/Pike code was when i have needed to modify it, due to embedded comments.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to parse and group hierarchical list items from an unindented string in Python?	ann23fr	0	179	Mar-27-2024, 01:16 PM Last Post: ann23fr
	Algorithm for extracting comments from Python source code	Pavel1982	6	514	Feb-28-2024, 09:52 PM Last Post: Pavel1982
	[split] Parse Nested JSON String in Python	mmm07	4	1,521	Mar-28-2023, 06:07 PM Last Post: snippsat
	python move specific files from source to destination including duplicates	mg24	3	1,096	Jan-21-2023, 04:21 AM Last Post: deanhystad
	Python Snippets Source	kucingkembar	0	630	Oct-18-2022, 12:50 AM Last Post: kucingkembar
	python read iperf log and parse throughput	jacklee26	4	2,757	Aug-27-2022, 07:04 AM Last Post: Yoriz
	How to parse a live feed in Python?	Daring_T	2	4,094	Jan-20-2022, 04:17 AM Last Post: Daring_T
	multi-line CMD in one-line python	kucingkembar	5	3,956	Jan-01-2022, 12:45 PM Last Post: kucingkembar
	Long-term stable source to get news headlines with Python?	sandufi	4	1,915	Dec-23-2021, 09:48 AM Last Post: sandufi
	Navigating cpython source using python	quazirfan	3	2,022	Aug-09-2021, 07:52 PM Last Post: quazirfan

i woule a way to parse a line python source like split

User Panel Messages

Announcements