Python Forum

Full Version: tabula-py, how to preserve a read_pdf() format and export to csv
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
this code:
from tabula import read_pdf


pdf_path = r'C:\Users\Arthur\PycharmProjects\Leitor\relatorio_base.pdf'
df1 = read_pdf(pdf_path, pages='all', guess = True, area=(406,24,695,589))

print(df1)
is giving me this output:

[    Unnamed: 0 Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4 Unnamed: 5
0        29129      Orion           2          20        8,1   Aprovado
1        29128      Orion           2          20        8,1   Aprovado
2        29127      Orion           2          20        8,1   Aprovado
3        29126      Orion           2          20        8,1   Aprovado
4        29125  Lightbury           2          20        8,1   Aprovado
5        29124  Lightbury           2          20        8,1   Aprovado
6        29123      Orion           0           5        5,6   Aprovado
7        29122      Orion           0           5        5,6   Aprovado
8        29121      Orion           0           5        5,6   Aprovado
9        29120      Orion           0           5        5,6   Aprovado
10       29119  Lightbury           0           5        5,6   Aprovado
11       29118  Lightbury           0           5        5,6   Aprovado
12       29117  Lightbury           0           5        5,6   Aprovado
13       29116  Lightbury           0           5        5,6   Aprovado]
this output is basically what i need, it's perfect for what i'm looking for. is there a way i can save this list as a .csv? I tried these ways, an example i found on internet:

salary = [['Alice', 'Data Scientist', 122000],
          ['Bob', 'Engineer', 77000],
          ['Ann', 'Manager', 119000]]


# Method 1
import csv
with open('file.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(salary)


# Method 2
import pandas as pd
df = pd.DataFrame(salary)
df.to_csv('file2.csv', index=False, header=False)


# Method 3
a = [[1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]]


import numpy as np
a = np.array(a)
np.savetxt('file3.csv', a, delimiter=',')


# Method 4
with open('file4.csv','w') as f:
    for row in salary:
        for x in row:
            f.write(str(x) + ',')
        f.write('\n')
none of them worked, they all returned me only the first line: Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5. Does anyone know how to covert this in a .csv file? thanks
tabula is retuning a Pandas DataFrame,now it's a list so use df1 = df1[0].
Then can export DataFrame to many formats DataFrame.to_csv.
Also look at tabula-py example notebook.
(Mar-24-2021, 08:25 PM)snippsat Wrote: [ -> ]tabula is retuning a Pandas DataFrame,now it's a list so use df1 = df1[0].
Then can export DataFrame to many formats DataFrame.to_csv.
Also look at tabula-py example notebook.

now it worked

Thank you so much!!!