Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extracting Data from tables
#1
Hi

I have code which works to extract data from tables in a PDF. The code puts the data into columns and transfers to a CSV file. The code works but I have a few problems I need some help with. From the first column in the table I needed to create a hierarchical system so I can filter the data to find specific items. I have attached a photo

I have a couple of problems with my code:

1. The level 1 data is using any data in UPPERCASE and splitting into a new column but returning items with numbers, how can i disregard numbers when using .isupper()

2. I need a level 2 but finding it difficult to get a code which can recognise bold text in the table and split that data into a column. Any ideas what i could use?

# Determine hierarchy 
for i, row in df_combine.iterrows():
    # Level 1: if its all in uppercase it is a new level 1 hierarchy
    if df_combine['Item'][i].isupper():
        df_combine.loc[i, 'Level1'] = df_combine['Item'][i]
    # Otherwise use the previous level 1 heirarchy
    elif i>0:
        df_combine.loc[i, 'Level1'] = df_combine['Level1'][i-1]
    
    # Future development: logic to determine level 2 heirarchy
    
    # Level 3: If it's not all uppercase, but the first character is it is a level 3 heirarchy
    if (not df_combine['Item'][i].isupper()) & (df_combine['Item'][i][0].isupper()):
        try:
            # If the next 2 rows are all lower, but it doesn't have a rate: join it to the first row above
            if (not df_combine['Item'][i+1][0].isupper()) & (not df_combine['Total Rate£'][i+1]==df_combine['Total Rate£'][i+1]) & (not df_combine['Item'][i+2][0].isupper()) & (not df_combine['Total Rate£'][i+2]==df_combine['Total Rate£'][i+2]):
                df_combine.loc[i, 'Level3'] = df_combine['Item'][i] + ' ' + df_combine['Item'][i+1]+ ' ' + df_combine['Item'][i+2]
            # else if the next row is all lower, but it doesn't have a rate: join it to the row above
            elif (not df_combine['Item'][i+1][0].isupper()) & (not df_combine['Total Rate£'][i+1]==df_combine['Total Rate£'][i+1]):
                df_combine.loc[i, 'Level3'] = df_combine['Item'][i] + ' ' + df_combine['Item'][i+1]
            # else level 3 is just a one-liner
            else:
                df_combine.loc[i, 'Level3'] = df_combine['Item'][i] 
        except:
            pass
# If it doesn't have a level 3, use the one above
for i, row in df_combine.iterrows():
    if (not df_combine['Level3'][i]==df_combine['Level3'][i]) & (i>0):
        df_combine.loc[i, 'Level3'] = df_combine['Level3'][i-1]

Attached Files

Thumbnail(s)
   
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Better python library to create ER Diagram by using pandas data frames as tables klllmmm 0 1,122 Oct-19-2023, 01:01 PM
Last Post: klllmmm
  Extracting Data into Columns using pdfplumber arvin 17 5,560 Dec-17-2022, 11:59 AM
Last Post: arvin
  extracting data ajitnayak1987 1 1,536 Jul-29-2021, 06:13 AM
Last Post: bowlofred
  Extracting and printing data ajitnayak1987 0 1,410 Jul-28-2021, 09:30 AM
Last Post: ajitnayak1987
  Extracting unique pairs from a data set based on another value rybina 2 2,307 Feb-12-2021, 08:36 AM
Last Post: rybina
Thumbs Down extracting data/strings from Word doc mikkelibsen 1 1,926 Feb-10-2021, 11:06 AM
Last Post: Larz60+
  Extracting data without showing dtype, name etc. tgottsc1 3 4,404 Jan-10-2021, 02:15 PM
Last Post: buran
  Extracting data from a website tgottsc1 2 2,271 Jan-09-2021, 08:14 PM
Last Post: tgottsc1
  Fetching data from multiple tables in a single request. swaroop 0 1,893 Jan-09-2021, 04:23 PM
Last Post: swaroop
  Load data from One oracle Table to Multiple tables amy83 1 1,778 Dec-02-2020, 01:57 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020