Dec-10-2022, 08:51 PM
(This post was last modified: Dec-11-2022, 01:54 AM by deanhystad.)
I think your pattern is wrong for finding blocks. I think it should look like this. Note that I left out some of the fields to shorten this post.
import re import pandas as pd pattern = re.compile( r"^(.*)\n" r"^Status: (.*)\n" r"^Category: (.*)\n" r"^Description: (.*)\n", flags=re.MULTILINE) with open("data.txt", "r") as file: text = "".join(file) columns = ["Item", "Status", "Category", "Description"] print(pd.DataFrame.from_records(re.findall(pattern, text), columns=columns))I made a dummy file with some valid and invalid blocks and extra fluff to ignore.
Output:ItemName 1
Status: Status Item 1
Category: Category Item 1
Description: Description Text 1
extra
stuff
ItemName 2
Category: Order is wrong
Status: Status Item 2
Description: Description Text 2
extra
stuff
ItemName 3
Status: Status Item 3
Category: Category Item 3
Sub-Category: Extra field
Description: Description Text 3
extra
stuff
ItemName 4
Statis: Spelling error
Category: Category Item 4
Description: Description Text 4
extra
stuff
ItemName 5
Status: Status Item 5
Category: Category Item 5
Description: Description Text 5
extra
stuff
When I run the program it finds the two valid blocks.Output: Item Status Category Description
0 ItemName 1 Status Item 1 Category Item 1 Description Text 1
1 ItemName 5 Status Item 5 Category Item 5 Description Text 5