Nov-09-2023, 06:56 PM
Do you want all of 1.pdf?
Do you just want the small table in the pdf that has Insumo in the top left column?
Do all these tables always have the same format?
Or do you want all of 1.pdf?
I don't have Portuguese as a language in Tesseract, so I ran on English. I get most of the text, but putting it all neatly in a table will be hard.
If all the pdfs have the same format, you can use Image to crop out each table and just work on 1 table at a time. But, because you have multilines in the column Insumo, it will get confusing.
If all the tables always have the same dimensions, then you can use Image to crop out each column, extract the text and write to Excel.
But I think the pdfs maybe have different page layouts and the tables therein will probably have different sizes!
Do you just want the small table in the pdf that has Insumo in the top left column?
Do all these tables always have the same format?
Or do you want all of 1.pdf?
I don't have Portuguese as a language in Tesseract, so I ran on English. I get most of the text, but putting it all neatly in a table will be hard.
If all the pdfs have the same format, you can use Image to crop out each table and just work on 1 table at a time. But, because you have multilines in the column Insumo, it will get confusing.
If all the tables always have the same dimensions, then you can use Image to crop out each column, extract the text and write to Excel.
But I think the pdfs maybe have different page layouts and the tables therein will probably have different sizes!