Aug-02-2019, 02:54 AM
Hi everyone,
An amateur python developer here. I am trying to some text extraction from a scanned PDF. The method I am following is scanned PDF to image to text (using Tesseract).I got reasonably good results when the PDF contained only text.
But, when the PDF had tables within them, I did not get any coherent results, i.e., data from different rows and columns are overlapping each other.
Looking for some help in extracting the tables from a scanned PDF - any and all ideas are much appreciated!
An amateur python developer here. I am trying to some text extraction from a scanned PDF. The method I am following is scanned PDF to image to text (using Tesseract).I got reasonably good results when the PDF contained only text.
But, when the PDF had tables within them, I did not get any coherent results, i.e., data from different rows and columns are overlapping each other.
Looking for some help in extracting the tables from a scanned PDF - any and all ideas are much appreciated!