Dec-15-2022, 05:27 AM
I'm starting to understand your problem!
Using reportlab I made a very simple pdf with 1 table of 4 rows and 5 columns. The table had borders.
The data for the table looks like this:
As you can see, the columns 2 and 4 contain no data, just ''
Using this, I got the data from the table:
Now how to know which columns in which rows have nothing in them??
Think think!
Using reportlab I made a very simple pdf with 1 table of 4 rows and 5 columns. The table had borders.
The data for the table looks like this:
Quote:data= [['00', '', '02', '', '04'],
['10', '', '12', '', '14'],
['20', '', '22', '', '24'],
['30', '', '32', '', '34']]
As you can see, the columns 2 and 4 contain no data, just ''
Using this, I got the data from the table:
import pdfplumber # Parse pdf with pdfplumber.open(path2pdf + savename2) as pdf: # Get the first page of the object page = pdf.pages[0] # Get the text data of the page text = page.extract_text() # Get all the tabular data of this page tables = page.extract_tables() # Traversing table for t_index in range(len(tables)): table = tables[t_index] # Traversing each row of data for data in table: print(data)Which gives this as output, you can see the empty values, so they should be in your pandas df as NaN, I suppose.
Output:['00', '', '02', '', '04']
['10', '', '12', '', '14']
['20', '', '22', '', '24']
['30', '', '32', '', '34']
But when I make a table with no borders, there is no table, just text:Quote:>>> tables
[]
>>> text
'00 02 04\n11 13\n20 22 23 24\n30 31 32 34'
>>>
Now how to know which columns in which rows have nothing in them??
Think think!