Oct-04-2023, 04:47 AM
Dear deanhystad,
I'm trying to count the pixels with the same RGB values. In a previous post I tried to determine when an image (and therefore a PDF) is empty. But it was limited to the background color white. I thought of the following: You determine the sums of the pixels of the same color. If there is only 1 sum, then the picture is empty. If there are 2 or more pixel sums with the same color: ignore the largest sum and consider all the next smallest sums. The proportion of the next smallest sums in the overall image is the proportion of the overall image that is filled. You should be able to specify a proportion from which the image is considered filled.
My test image is only 10 x 10 pixel:
I'm planning to test scanned pages of paper (will be much bigger than 10 x 10 pixel).
Would it be better to use pandas for a real page(it seems to be much faster than numpy)?
Thanks a lot for your outstanding, detailed help!!
I'm trying to count the pixels with the same RGB values. In a previous post I tried to determine when an image (and therefore a PDF) is empty. But it was limited to the background color white. I thought of the following: You determine the sums of the pixels of the same color. If there is only 1 sum, then the picture is empty. If there are 2 or more pixel sums with the same color: ignore the largest sum and consider all the next smallest sums. The proportion of the next smallest sums in the overall image is the proportion of the overall image that is filled. You should be able to specify a proportion from which the image is considered filled.
import numpy as np import pandas as pd from PIL import Image image = Image.open("p2.bmp") pixels = np.array(Image.open("p2.bmp").convert('RGB')) rows, columns, rgba = pixels.shape pixels = np.reshape(pixels, (-1, rgba)) #mergeArray = [] # for pixel in pixels: # r, g, b = pixel # a = str(r) + str(g) + str(b) # mergeArray.append(a) # count_same_RGB = np.unique(mergeArray) arr_colors, arr_counts = np.unique(pixels.reshape(-1, 3), axis=0, return_counts=1) print(arr_colors) print(arr_counts) # is image empty? (largest_amount_pixels_same_RGB == amount_pixels_image) amount_pixels_whole_image = image.width * image.height largest_amount_pixels_same_RGB = np.max(arr_counts) # # remove largest amount pixels same RGB from arr_counts arr_counts_2 = np.delete(arr_counts, np.where(arr_counts == largest_amount_pixels_same_RGB)) sum_portions_filled_pixels = sum(arr_counts_2)/amount_pixels_whole_image print("sum_portions_filled_pixels = " + str(sum_portions_filled_pixels)) # userdef_image_is_full = 0.1 if len(arr_colors) == 1 or sum_portions_filled_pixels < userdef_image_is_full: print("The image is empty") else: print("The image is filled")What do you think of my idea?
My test image is only 10 x 10 pixel:
I'm planning to test scanned pages of paper (will be much bigger than 10 x 10 pixel).
Would it be better to use pandas for a real page(it seems to be much faster than numpy)?
Thanks a lot for your outstanding, detailed help!!