Dear deanhystad,
I was a bit confused and tried a lot, because the detection of the right primary color ratio wasn't working properly...
(And, among other things, I tried to use Counter instead of numpy.unique - but I wasn't able to get it working.)
I'm sorry for my slightly confused posting...
What do you think of my new idea to use:
to reduce the colors?
My idea is to use it like a filter to get the right primary color ratio.
Testing the code with a "clean" white 10 x 10 pixel bmp-file, which contains 2 colored pixels:
The result is that primColorRatio is 0.98.
filled_ratio is about 0.02.
And the image is considered as filled.
Then I tested the code with a scanned white sheet of paper, which contained only black writing.
The result is that primColorRatio is 0.912429823754084.
filled_ratio is about 0.08757017624591601.
And the image is considered as filled.
Is this a suitable way to perform primaryColorRatio determination?
I'm pretty confident...
If that is the case, then the next step I will take is to examine every xth pixel.
Best regards,
flash77
I was a bit confused and tried a lot, because the detection of the right primary color ratio wasn't working properly...
(And, among other things, I tried to use Counter instead of numpy.unique - but I wasn't able to get it working.)
I'm sorry for my slightly confused posting...
What do you think of my new idea to use:
# reducing colors image = img.quantize(colors=2, method=None, kmeans=0, palette=None)
to reduce the colors?
My idea is to use it like a filter to get the right primary color ratio.
Testing the code with a "clean" white 10 x 10 pixel bmp-file, which contains 2 colored pixels:
The result is that primColorRatio is 0.98.
filled_ratio is about 0.02.
And the image is considered as filled.
Then I tested the code with a scanned white sheet of paper, which contained only black writing.
The result is that primColorRatio is 0.912429823754084.
filled_ratio is about 0.08757017624591601.
And the image is considered as filled.
Is this a suitable way to perform primaryColorRatio determination?
I'm pretty confident...
If that is the case, then the next step I will take is to examine every xth pixel.
Best regards,
flash77
import numpy as np from pdf2image import convert_from_path from PIL import Image import time def primary_color_ratio(pdf_name): """Return ratio of pixels that are the "background" color.""" pages = convert_from_path(pdf_name, 300) # save pdf to bmp bmpImage = pages[0].save("bmpImage.bmp", "BMP") # open Image img = Image.open(r"bmpImage.bmp") # reducing colors image = img.quantize(colors=2, method=None, kmeans=0, palette=None) # convert to rgb image_rgb = image.convert('RGB') # I will do it later: examine just every xth pixel rgb = np.array(image_rgb).reshape(-1, 3)#[::x] #get 24Bit Color b24 = rgb[:, 0] * 65536 + rgb[:, 1] * 256 + rgb[:, 2] _, counts = np.unique(b24, return_counts=True) return max(counts) / max(len(b24), 1) #"filled" is the content that can be read by humans (for example: writing) #"userdef_image_filled_ratio_whole_page": the ratio at which an image is considered filled pdf_name = "t2.pdf" userdef_image_filled_ratio_whole_page = 0.02 startTime = time.time() primColorRatio = primary_color_ratio(pdf_name) print("primColorRatio = " + str(primColorRatio)) filled_ratio = 1 - primColorRatio endTime = time.time() print("filled_ratio = " + str(filled_ratio)) print("elapsed time: ", (endTime - startTime)) if filled_ratio >= userdef_image_filled_ratio_whole_page: print("The image is filled.") else: print("The image is empty.")