Python Forum
Extracting Data into Columns using pdfplumber
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extracting Data into Columns using pdfplumber
#8
I'm starting to understand your problem!

Using reportlab I made a very simple pdf with 1 table of 4 rows and 5 columns. The table had borders.

The data for the table looks like this:

Quote:data= [['00', '', '02', '', '04'],
['10', '', '12', '', '14'],
['20', '', '22', '', '24'],
['30', '', '32', '', '34']]

As you can see, the columns 2 and 4 contain no data, just ''

Using this, I got the data from the table:

import pdfplumber
# Parse pdf
with pdfplumber.open(path2pdf + savename2) as pdf:
         # Get the first page of the object
    page = pdf.pages[0]
         # Get the text data of the page
    text = page.extract_text()
         # Get all the tabular data of this page
    tables = page.extract_tables()
         # Traversing table
    for t_index in range(len(tables)):
        table = tables[t_index]
                 # Traversing each row of data
        for data in table:
            print(data)
Which gives this as output, you can see the empty values, so they should be in your pandas df as NaN, I suppose.

Output:
['00', '', '02', '', '04'] ['10', '', '12', '', '14'] ['20', '', '22', '', '24'] ['30', '', '32', '', '34']
But when I make a table with no borders, there is no table, just text:

Quote:>>> tables
[]
>>> text
'00 02 04\n11 13\n20 22 23 24\n30 31 32 34'
>>>

Now how to know which columns in which rows have nothing in them??

Think think!
Reply


Messages In This Thread
RE: Extracting Data into Columns using pdfplumber - by Pedroski55 - Dec-15-2022, 05:27 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 2,066 Dec-12-2022, 08:22 PM
Last Post: jh67
  How to keep columns header on excel without change after export data to excel file? ahmedbarbary 0 1,221 May-03-2022, 05:46 PM
Last Post: ahmedbarbary
  Extracting Data from tables DataExtrator 0 1,193 Nov-02-2021, 12:24 PM
Last Post: DataExtrator
  Merging spreadsheets with the same columns and extracting rows with matching entries johnbernard 3 11,777 Aug-19-2021, 03:08 PM
Last Post: johnbernard
  extracting data ajitnayak1987 1 1,607 Jul-29-2021, 06:13 AM
Last Post: bowlofred
  Extracting and printing data ajitnayak1987 0 1,455 Jul-28-2021, 09:30 AM
Last Post: ajitnayak1987
  Python Pandas: How do I average ONLY the data >1000 from several columns? JaneTan 0 1,516 Jul-17-2021, 01:34 PM
Last Post: JaneTan
  SaltStack: MySQL returner save less data into Database table columns xtc14 2 2,240 Jul-02-2021, 02:19 PM
Last Post: xtc14
  [Solved] Using readlines to read data file and sum columns Laplace12 4 3,721 Jun-16-2021, 12:46 PM
Last Post: Laplace12
  Pandas: how to split one row of data to multiple rows and columns in Python GerardMoussendo 4 7,061 Feb-22-2021, 06:51 PM
Last Post: eddywinch82

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020