Python Forum
Export data from PDF as tabular format
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Export data from PDF as tabular format
#6
Hi again! I understand my approach may not be useful in this situation.

I am not good with regex!

Just as a test, I did this:

First, I used cropImage(path2jpg) to get the first column, Insumo and saved it as temp.jpg

Then I used the function below:

def convert2text(name):
    # only 1 jpg now
    jpgFile = path2tempjpg + 'temp.jpg'    
    with open(path2text + name, 'a') as this_text:
        # this works fine
        porText = pytesseract.image_to_string(Image.open(jpgFile), lang='por')
        this_text.write(porText)
    print('removing the jpgs ... ')
    junkjpgs(path2tempjpg)
    print('finished this image ... ')
This gives:

Output:
Insumo 4094 - FECHADURA ELETRÔNICA PARA PORTA DE ABRIR - FE 21150 S/ MAÇANETA 4565 - CONTROLE REMOTO XAC 4000
But like I said, this is only useful if all the PDFs have the same format, because you need the exact coordinates for cropping the image.
Reply


Messages In This Thread
Export data from PDF as tabular format - by zinho - Nov-08-2023, 09:28 PM
RE: Export data from PDF as tabular format - by Pedroski55 - Nov-11-2023, 08:23 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 1,974 Dec-12-2022, 08:22 PM
Last Post: jh67
  BCP Export sql data to csv mg24 2 1,102 Nov-20-2022, 11:45 AM
Last Post: Pedroski55
  Issue in changing data format (2 bytes) into a 16 bit data. GiggsB 11 2,741 Jul-25-2022, 03:19 PM
Last Post: deanhystad
  How to keep columns header on excel without change after export data to excel file? ahmedbarbary 0 1,199 May-03-2022, 05:46 PM
Last Post: ahmedbarbary
  Need Help writing data into Excel format ajitnayak87 8 2,603 Feb-04-2022, 03:00 AM
Last Post: Jeff_t
Smile Set 'Time' format cell when writing data to excel and not 'custom' limors 3 6,402 Mar-29-2021, 09:36 PM
Last Post: Larz60+
  tabula-py, how to preserve a read_pdf() format and export to csv abcoelho 2 3,394 Mar-24-2021, 08:34 PM
Last Post: abcoelho
  ValueError: time data 'None' does not match format '%Y-%m-%dT%H:%M:%S.%f' rajesh3383 4 14,852 Sep-03-2020, 08:22 PM
Last Post: buran
  Issue accessing data from Dictionary/List in the right format LuisSatch 2 2,274 Jul-25-2020, 06:12 AM
Last Post: LuisSatch
  getting error ValueError: time data '' does not match format '%H:%M' srisrinu 2 5,638 Apr-09-2020, 11:12 AM
Last Post: srisrinu

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020