Python Forum
OCR-Python from Multi TIFF to HOCR getting only Data from 1st Page of multiple TIFF
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
OCR-Python from Multi TIFF to HOCR getting only Data from 1st Page of multiple TIFF
#1
Hi,

I need your help with the code which can be replaced in the below one. Aim able to run the file with the output to hocr output.
But, my requirement is to parse the data from multiple page TIFF image, where I am getting data only from 1st page



# Python program to extract text from all the images in a folder
# storing the text in corresponding files in a different folder
# This is for hocr output, but there is error of getting only 1 page
from PIL import Image
import pytesseract as pt
import os
pt.pytesseract.tesseract_cmd = r'C:\Users\admin\AppData\Local\Programs\Tesseract-OCR\tesseract.exe'
	
def main():
	# path for the folder for getting the raw images
	path ="D:\\input"
	# path for the folder for getting the output
	tempPath ="D:\\output"

	# iterating the images inside the folder
	for imageName in os.listdir(path):
			
		inputPath = os.path.join(path, imageName)
		img = Image.open(inputPath)

		# applying ocr using pytesseract for python
          
		text = pt.image_to_pdf_or_hocr(img, extension = 'hocr', config = (r'--oem 3 --psm 6'), lang ="eng")
		
		fullTempPath = os.path.join(tempPath, 'time_'+imageName+".hocr")
		print(text)
 
		# saving the text for every image in a separate .hocr file
		file1 = open(fullTempPath, "wb")
		file1.write(text)
		file1.close()
 

if __name__ == '__main__':
	main()
Thank you
Joe
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  python convert multiple files to multiple lists MCL169 6 1,436 Nov-25-2023, 05:31 AM
Last Post: Iqratech
  Take data from web page problem codeweak 5 861 Nov-01-2023, 12:29 AM
Last Post: codeweak
  Python SSL web page scraping Vadanane 1 874 Jan-13-2023, 04:11 PM
Last Post: snippsat
  Load multiple Jason data in one Data Frame vijays3 6 1,500 Aug-12-2022, 05:17 PM
Last Post: vijays3
  how to extract tiff images from the subfolder into. hocr format in another similar su JOE 0 1,131 Feb-16-2022, 06:28 PM
Last Post: JOE
  Python, how to manage multiple data in list or dictionary with calculations and FIFO Mikeardy 8 2,528 Dec-31-2021, 07:47 AM
Last Post: Mikeardy
  Compressed multi page tiff wvanoeveren 2 2,613 Dec-28-2021, 11:40 AM
Last Post: Gribouillis
  simple html page with update data korenron 3 2,587 Nov-15-2021, 09:31 AM
Last Post: jamesaarr
  How to map two data frames based on multiple condition SriRajesh 0 1,448 Oct-27-2021, 02:43 PM
Last Post: SriRajesh
  How to open/load image .tiff files > 2 GB ? hobbyist 1 2,385 Aug-19-2021, 12:50 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020