Export data from PDF as tabular format

Thread Rating:

0 Vote(s) - 0 Average
1
2
3
4
5

Thread Modes

Export data from PDF as tabular format

Pedroski55
Giant Foot

Posts: 853

Threads: 134

Joined: Jul 2017

Reputation: 25

Nov-11-2023, 08:23 AM (This post was last modified: Nov-11-2023, 08:23 AM by Pedroski55.)

Hi again! I understand my approach may not be useful in this situation.

I am not good with regex!

Just as a test, I did this:

First, I used cropImage(path2jpg) to get the first column, Insumo and saved it as temp.jpg

Then I used the function below:

def convert2text(name):
    # only 1 jpg now
    jpgFile = path2tempjpg + 'temp.jpg'    
    with open(path2text + name, 'a') as this_text:
        # this works fine
        porText = pytesseract.image_to_string(Image.open(jpgFile), lang='por')
        this_text.write(porText)
    print('removing the jpgs ... ')
    junkjpgs(path2tempjpg)
    print('finished this image ... ')

This gives:

Output:Insumo

 

4094 - FECHADURA
ELETRÔNICA

PARA PORTA DE ABRIR - FE
21150 S/ MAÇANETA

 

4565 - CONTROLE REMOTO
XAC 4000

But like I said, this is only useful if all the PDFs have the same format, because you need the exact coordinates for cropping the image.

Find

Messages In This Thread

Export data from PDF as tabular format - by zinho - Nov-08-2023, 09:28 PM

RE: Export data from PDF as tabular format - by Pedroski55 - Nov-09-2023, 06:56 PM

RE: Export data from PDF as tabular format - by zinho - Nov-09-2023, 07:11 PM

RE: Export data from PDF as tabular format - by Pedroski55 - Nov-10-2023, 09:07 AM

RE: Export data from PDF as tabular format - by zinho - Nov-10-2023, 05:11 PM

RE: Export data from PDF as tabular format - by Pedroski55 - Nov-11-2023, 08:23 AM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to properly format rows and columns in excel data from parsed .txt blocks	jh67	7	1,974	Dec-12-2022, 08:22 PM Last Post: jh67
	BCP Export sql data to csv	mg24	2	1,102	Nov-20-2022, 11:45 AM Last Post: Pedroski55
	Issue in changing data format (2 bytes) into a 16 bit data.	GiggsB	11	2,741	Jul-25-2022, 03:19 PM Last Post: deanhystad
	How to keep columns header on excel without change after export data to excel file?	ahmedbarbary	0	1,199	May-03-2022, 05:46 PM Last Post: ahmedbarbary
	Need Help writing data into Excel format	ajitnayak87	8	2,603	Feb-04-2022, 03:00 AM Last Post: Jeff_t
	Set 'Time' format cell when writing data to excel and not 'custom'	limors	3	6,402	Mar-29-2021, 09:36 PM Last Post: Larz60+
	tabula-py, how to preserve a read_pdf() format and export to csv	abcoelho	2	3,394	Mar-24-2021, 08:34 PM Last Post: abcoelho
	ValueError: time data 'None' does not match format '%Y-%m-%dT%H:%M:%S.%f'	rajesh3383	4	14,852	Sep-03-2020, 08:22 PM Last Post: buran
	Issue accessing data from Dictionary/List in the right format	LuisSatch	2	2,274	Jul-25-2020, 06:12 AM Last Post: LuisSatch
	getting error ValueError: time data '' does not match format '%H:%M'	srisrinu	2	5,638	Apr-09-2020, 11:12 AM Last Post: srisrinu

Users browsing this thread: 1 Guest(s)

View a Printable Version

Export data from PDF as tabular format

User Panel Messages

Announcements