Hi
I have a PDF file from where i need to extract all the tables and also the text above the tables and output the results to a csv file.By using tabula, i have tried extracting the tables, but i am not sure on how to extract the texts which are above the tables.I have to extract the Perf factor whose values are Accuracy and Time and also the text below the Perf factor which is the 'Description' I have attached the sample input PDF file. My original PDF file has 100+ tables
Input.pdf (Size: 52.29 KB / Downloads: 7)
I have tried the following code to extract the tables
Python 3.9.13
Anaconda Navigator , Spyder
I was wondering if i should convert PDF to text to extract the text or if there is another better way. Any help would be much appreciated. Thanks in advance.
I have a PDF file from where i need to extract all the tables and also the text above the tables and output the results to a csv file.By using tabula, i have tried extracting the tables, but i am not sure on how to extract the texts which are above the tables.I have to extract the Perf factor whose values are Accuracy and Time and also the text below the Perf factor which is the 'Description' I have attached the sample input PDF file. My original PDF file has 100+ tables

I have tried the following code to extract the tables
# Import the required Module import tabula # Read a PDF File df = tabula.read_pdf("Input.pdf", pages='all')[0] # convert PDF into CSV tabula.convert_into("Input.pdf", "Output.csv", output_format="csv", pages='all') print(df)The expected output CSV should look like as shown below:
Output:Perf factor Accuracy
Description Accuracy of participants
Perf factor attributes Value
Category Football
Participants 11
Ballots Completed 1
Ballots Terminated 4
Perf factor Time
Description Total time taken
Perf factor attributes Value
Category Cricket
Participants 10
Ballots Completed 4
Ballots Terminated 9
Please find the details of software which i use:Python 3.9.13
Anaconda Navigator , Spyder
I was wondering if i should convert PDF to text to extract the text or if there is another better way. Any help would be much appreciated. Thanks in advance.