Python Forum
tabula-py, how to preserve a read_pdf() format and export to csv
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
tabula-py, how to preserve a read_pdf() format and export to csv
#1
this code:
from tabula import read_pdf


pdf_path = r'C:\Users\Arthur\PycharmProjects\Leitor\relatorio_base.pdf'
df1 = read_pdf(pdf_path, pages='all', guess = True, area=(406,24,695,589))

print(df1)
is giving me this output:

[    Unnamed: 0 Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4 Unnamed: 5
0        29129      Orion           2          20        8,1   Aprovado
1        29128      Orion           2          20        8,1   Aprovado
2        29127      Orion           2          20        8,1   Aprovado
3        29126      Orion           2          20        8,1   Aprovado
4        29125  Lightbury           2          20        8,1   Aprovado
5        29124  Lightbury           2          20        8,1   Aprovado
6        29123      Orion           0           5        5,6   Aprovado
7        29122      Orion           0           5        5,6   Aprovado
8        29121      Orion           0           5        5,6   Aprovado
9        29120      Orion           0           5        5,6   Aprovado
10       29119  Lightbury           0           5        5,6   Aprovado
11       29118  Lightbury           0           5        5,6   Aprovado
12       29117  Lightbury           0           5        5,6   Aprovado
13       29116  Lightbury           0           5        5,6   Aprovado]
this output is basically what i need, it's perfect for what i'm looking for. is there a way i can save this list as a .csv? I tried these ways, an example i found on internet:

salary = [['Alice', 'Data Scientist', 122000],
          ['Bob', 'Engineer', 77000],
          ['Ann', 'Manager', 119000]]


# Method 1
import csv
with open('file.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(salary)


# Method 2
import pandas as pd
df = pd.DataFrame(salary)
df.to_csv('file2.csv', index=False, header=False)


# Method 3
a = [[1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]]


import numpy as np
a = np.array(a)
np.savetxt('file3.csv', a, delimiter=',')


# Method 4
with open('file4.csv','w') as f:
    for row in salary:
        for x in row:
            f.write(str(x) + ',')
        f.write('\n')
none of them worked, they all returned me only the first line: Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5. Does anyone know how to covert this in a .csv file? thanks
Reply
#2
tabula is retuning a Pandas DataFrame,now it's a list so use df1 = df1[0].
Then can export DataFrame to many formats DataFrame.to_csv.
Also look at tabula-py example notebook.
abcoelho likes this post
Reply
#3
Thumbs Up 
(Mar-24-2021, 08:25 PM)snippsat Wrote: tabula is retuning a Pandas DataFrame,now it's a list so use df1 = df1[0].
Then can export DataFrame to many formats DataFrame.to_csv.
Also look at tabula-py example notebook.

now it worked

Thank you so much!!!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Export data from PDF as tabular format zinho 5 709 Nov-11-2023, 08:23 AM
Last Post: Pedroski55
  python-docx: preserve formatting when printing lines Tmagpy 4 2,116 Jul-09-2022, 01:15 AM
Last Post: Tmagpy
  issue with Tabula-py, pyinstaller and java maurom82 2 3,196 Feb-19-2021, 04:32 PM
Last Post: buran
  How to preserve x-axis labels despite deleted subplot? Mark17 1 1,943 Dec-23-2020, 09:02 PM
Last Post: Mark17
  Preserve xml file format tanffn 3 3,907 Jan-03-2020, 09:35 AM
Last Post: Larz60+
  How to switch table area coordinates in Python Camelot and Tabula-Py john5 0 4,287 May-08-2019, 04:31 PM
Last Post: john5
  Preserve Encapsulation while Displaying Information QueenSvetlana 13 7,029 Dec-07-2017, 06:13 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020