Python Forum
How to properly format rows and columns in excel data from parsed .txt blocks
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to properly format rows and columns in excel data from parsed .txt blocks
#1
Hi. I'm trying to create an excel sheet using data made up from specific, recurring blocks of text in a .txt file and ignoring all other data in the file (table of contents, etc). The blocks of text are in the format shown here and I'm having two different issues with the output in the final excel sheet.

Quote:ItemName
Status: StatusItem
Category: CategoryItem
Sub-Category: SubcategoryItem
Date: MM/dd/yyyy
Frequency: FrequenceItem
Language: LanguageItem
Description: DescriptionText

Currently, my code is using a dictionary made by searching for a block pattern, inserting it into a data-frame (pandas) and exporting to excel. The issue I'm having, however, is that it creates a new row for each line in the text block rather than creating a new row for each block. Shown in the attached image.

   

How can I prevent this with my block pattern?


The second issue is that the top item ("ITEMNAME") is not in the same format as it doesn't have a "key" to use in the dictionary. How would I include it in the block pattern to add to the dictionary and export being in a different format?

Code:

import pandas as pd
import re

txtFilePath = r'/test.txt'

with open(txtFilePath, 'r') as f:
    text = f.read()

# define pattern for block search - first method
block_pattern = re.compile(r"^(?:Status|Event Category|Event Sub-Category|"
                           r"Added Date|Update Frequency|Language|Description):.+$",
                           re.MULTILINE)


# find all blocks that match the pattern
blocks = block_pattern.findall(text)

df = pd.DataFrame()
# columns=["Status", "Category", "Sub-Category", "Date", "Frequency", "Language", "Description"]

# iterate over blocks and extract patterns
for block in blocks:
    # split blocks on individual lines
    lines = block.strip().split("\n")
    # dict to store items for this block
    item_dict = {}

    # iterate to find type and values
    for line in lines:
        item_parts = line.strip().split(":", 1)
        item_type = item_parts[0].strip()
        item_value = item_parts[1].strip()

        # add item to dict
        item_dict[item_type] = item_value

    df = df.append(item_dict, ignore_index=True)

# print(df)
df.to_excel(r'/text.xlsx')
I've tried two changes to the append line to fix the extra rows.

First change:
df.loc[0] = df.append(item_dict, ignore_index=True)
but it gives this error:
>
Quote:ValueError: cannot set row with mismatched columns

Second Change:
df.loc[0] = item_dict
This change creates the columns but imports no data from the dictionary. The excel sheet has columns but nothing else.

I've been scouring the internet for solutions and have really had no luck. Thanks in advance for any assistance.
Reply


Messages In This Thread
How to properly format rows and columns in excel data from parsed .txt blocks - by jh67 - Dec-09-2022, 08:00 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Excel isnt working properly after python function is started IchNar 2 325 May-01-2024, 06:43 PM
Last Post: IchNar
  RSA Cipher with blocks Paragoon2 0 532 Nov-26-2023, 04:35 PM
Last Post: Paragoon2
  Export data from PDF as tabular format zinho 5 782 Nov-11-2023, 08:23 AM
Last Post: Pedroski55
  how do you style data frame that has empty rows. gsaray101 0 550 Sep-08-2023, 05:20 PM
Last Post: gsaray101
  Copy data from Excel and paste into Discord (Midjourney) Joe_Wright 4 2,151 Jun-06-2023, 05:49 PM
Last Post: rajeshgk
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,154 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Converting a json file to a dataframe with rows and columns eyavuz21 13 4,743 Jan-29-2023, 03:59 PM
Last Post: eyavuz21
  Comparing two columns with same value but different font format doug2019 1 750 Jan-08-2023, 02:58 PM
Last Post: Larz60+
  (Python) Pulling data from UA Google Analytics with more than 100k rows into csv. Stockers 0 1,275 Dec-19-2022, 11:11 PM
Last Post: Stockers
  Extracting Data into Columns using pdfplumber arvin 17 5,755 Dec-17-2022, 11:59 AM
Last Post: arvin

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020