Python Forum
identify not white pixels in bmp
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
identify not white pixels in bmp
#1
Dear community,

I would like to find the pixels of an image that are different from white.
The position is not important; later I would like to determine the proportion of pixels in the overall image that are different from white.
(I want to use this to determine whether a PDF that has been converted to a BMP for analysis is blank.)

The image p2 is completely white except for 1 blue, 1 green and 1 red pixel.
I would like to find these 3 pixels.

import numpy as np
from PIL import Image

img = Image.open("p2.bmp")

numpy_array = np.array(img)

# identify not white pixels
for i in numpy_array:
    if i[0] != 255 and [i][1] != 255 and [i][2] != 255:
        print(i)
Unfortunately, I don't understand the following error message:
Traceback (most recent call last):
  File "D:\Daten\aktuell\leerePDFs_erkennen\main.py", line 22, in <module>
    if i[0] != 255 and [i][1] != 255 and [i][2] != 255:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I would be very happy about a tip!!

Thanks alot!!
Reply
#2
Hi,
Line 10: you seem to write the first element i[0] with a different format than the other elements.
line 10: if pixels are white, their RGB sum is = 765, it's ieasier to find the sum, no need for "and". So look for < 765.
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#3
What did you do to understand this problem before posting? Did you look at the shape of numpy_array? Did you read enough about numpy to know that they can be multidimensional?
 
numpy_arrray (a terrible variable name) will be a 3-dimensional array. You will know this if you look at the shape. The first dimension are rows in the image. The second dimension are pixels in the rows, and the third dimension are red, green, blue (, opacity) values for each pixel. In your code, i[0] is the first pixel in a row, not red. [i][1] and [i][2] are not python.

You could create an array of pixels by reshaping the array.
pixels = np.array(Image.open("p2.bmp"))
rows, columns, rgba = pixels.shape
print(rows, columns, rgba
pixels = np.reshape(pixels, (-1, rgba))
Now you have an array of pixels, not an image. You can do this:
for pixel in pixels:
   r, g, b = pixel
Now you learn your logic is wrong. RGB for cyan is (0, 255, 255). RGB for magenta is (255, 0, 255) RGB for yellow is (255, 255, 0). According to your program logic, cyan, magenta and yellow are all white. According to your logic, red, green and blue are also white. All of these colors have one or more components that == 255.
for pixel in pixels:
   r, g, b = pixel
   if r < 255 or g < 255 or b < 255:
       print(pixel)l
The description of your problem is unclear. Do you want a count of pixels, or is a single non-white pixel enough to satisfy the test? Do you care about the color of this pixel, or is it only important that it not be white?

If all you want to know is if there is any non-white pixel in the image, I would forget about pixel values and test the individual components.
import numpy as np
from PIL import Image

values = np.array(Image.open("test.png"))[:, :, :3].flatten()
print(np.any(values < 255))
Or
import numpy as np
from PIL import Image

image = Image.open("test.png")
pixels = np.array(image)
rgb_pixels = pixels[:, :, :3]  # Throw away any opacity information.
values = rgb_pixels.flatten()  # Make a 1-dimensional array
values_less_than_255 = values < 255  # array of bool.  True if value < 255
any_value_less_than_255 = np.any(values_less_than_255)  # Are any values < 255?
Bitmaps don't have opacity information. If you are only ever going to test a bitmap file you can skip the slicing step.
Reply
#4
Dear DPaul, dear deanhystad,

thank you very much for your very good answers!

I wanted to get the number of non-white pixels.

With the help of the outstandingly good answers, I was able to determine this number.

I would like to apologize if I am not very experienced (numpy is still completely new to me).

Thank you again for having this great forum with these very capable members!!

Best regards, flash77

pixels = np.array(Image.open("p2.bmp"))
rows, columns, rgba = pixels.shape
# to have an array of pixels, not an image:
pixels = np.reshape(pixels, (-1, rgba))
count_of_non_white_pixels = 0
for pixel in pixels:
    r, g, b = pixel
    if r < 255 or g < 255 or b < 255:
        count_of_non_white_pixels += 1
print(count_of_non_white_pixels)
Reply
#5
To find the pixels in an image that are different from white and later determine the proportion of non-white pixels in the overall image, you can use image processing libraries such as Python's OpenCV. Here's a step-by-step guide:


1. ** Install OpenCV **: If you haven't already, you'll need to install the OpenCV library. You can do this using pip:
pip install opencv-python

2. ** Load and Process the Image **:

   import cv2
   import numpy as np

   # Load the image
   image = cv2.imread('p2.bmp')

   # Convert the image to grayscale
   gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

   # Define the color white (you may need to adjust this based on your image)
   white = [255, 255, 255]

   # Create a mask for non-white pixels
   mask = cv2.inRange(image, np.array(white), np.array(white))

   # Find the non-white pixels in the original image
   non_white_pixels = cv2.countNonZero(mask)

   # Calculate the proportion of non-white pixels in the image
   total_pixels = image.shape[0] * image.shape[1]
   proportion_non_white = non_white_pixels / total_pixels

   # Display the proportion
   print(f"Proportion of non-white pixels: {proportion_non_white:.4f}")
This code will load your image ('p2.bmp'), convert it to grayscale, and create a mask for the non-white pixels. It then calculates the proportion of non-white pixels in the image.

3. ** Adjust the White Threshold **: In the code above, we assumed that white pixels have the RGB values [255, 255, 255]. If your white is slightly different, you may need to adjust these values accordingly.

4. ** Run the Code **: Save your image as 'p2.bmp' in the same directory as the Python script, and run the code. It will print the proportion of non-white pixels in the image.

This code should help you determine whether your BMP image contains non-white pixels, which can be useful for detecting blank PDFs.
buran write Oct-09-2023, 03:01 AM:
Spam link removed
Reply
#6
You should avoid using for loops for things like image processing. They are very slow compared to calling functions in numpy or cv2.
A way to count non-white pixels using numpy.
import numpy as np
from PIL import Image

# Make an array of pixels
pixels = np.array(Image.open("test.png"))[:, :, :3].reshape(-1, 3)

# Add up the RGB values
pixel_sums = np.sum(pixels, axis=1)

# Count number of pixels with sum < 765
not_white = (np.count_nonzero(pixel_sums < 765))
Reply
#7
Dear deanhystad,
I'm trying to count the pixels with the same RGB values. In a previous post I tried to determine when an image (and therefore a PDF) is empty. But it was limited to the background color white. I thought of the following: You determine the sums of the pixels of the same color. If there is only 1 sum, then the picture is empty. If there are 2 or more pixel sums with the same color: ignore the largest sum and consider all the next smallest sums. The proportion of the next smallest sums in the overall image is the proportion of the overall image that is filled. You should be able to specify a proportion from which the image is considered filled.

import numpy as np
import pandas as pd
from PIL import Image

image = Image.open("p2.bmp")
pixels = np.array(Image.open("p2.bmp").convert('RGB'))
rows, columns, rgba = pixels.shape
pixels = np.reshape(pixels, (-1, rgba))
#mergeArray = []
# for pixel in pixels:
#    r, g, b = pixel
#    a = str(r) + str(g) + str(b)
#    mergeArray.append(a)
# count_same_RGB = np.unique(mergeArray)
arr_colors, arr_counts = np.unique(pixels.reshape(-1, 3), axis=0, return_counts=1)
print(arr_colors)
print(arr_counts)

# is image empty? (largest_amount_pixels_same_RGB == amount_pixels_image)
amount_pixels_whole_image = image.width * image.height
largest_amount_pixels_same_RGB = np.max(arr_counts)
# # remove largest amount pixels same RGB from arr_counts
arr_counts_2 = np.delete(arr_counts, np.where(arr_counts == largest_amount_pixels_same_RGB))

sum_portions_filled_pixels = sum(arr_counts_2)/amount_pixels_whole_image
print("sum_portions_filled_pixels = " + str(sum_portions_filled_pixels))
#
userdef_image_is_full = 0.1
if len(arr_colors) == 1 or sum_portions_filled_pixels < userdef_image_is_full:
    print("The image is empty")
else:
    print("The image is filled")
What do you think of my idea?

My test image is only 10 x 10 pixel:
I'm planning to test scanned pages of paper (will be much bigger than 10 x 10 pixel).
Would it be better to use pandas for a real page(it seems to be much faster than numpy)?

Thanks a lot for your outstanding, detailed help!!
Reply
#8
I think that should work, but it can be simplified. Compute the ratio of pixels that are the most common color. Compare ratio to a threshold.
from PIL import Image
import numpy as np

def primary_color_ratio(image):
    """Return ratio of pixels that are the "background" color."""
    pixels = np.array(image.getdata())
    _, counts  = np.unique(pixels, axis=0, return_counts=True)
    return max(counts) / (max(len(pixels), 1)


print(primary_color_ratio(Image.new("RGB", (10, 10), (255, 0, 0))))
print(primary_color_ratio(Image.open("test.jpg")))
Output:
1.0 0.032
Reply
#9
Dear deanhystad,

thanks a lot for your answer!!

I will deal with lots of scanned papers, so it is import to analyze them fast.
I noticed that pandas is faster than numpy.

Is it possible to use pandas for analyzing wether a page is blank or not?

import numpy as np
import pandas as pd
from PIL import Image
import time

userdef_image_recognized_as_filled = 0.2
startTime = time.time()
image = Image.open("2.jpg")
pixels = np.array(image.getdata())
_, counts = np.unique(pixels, axis=0, return_counts=True)
portion_background_color = max(counts) / (max(len(pixels), 1))
portion_filled = 1 - portion_background_color
endTime = time.time()
print("portion_filled = " + str(portion_filled))
print("portion_background_color = " + str(portion_background_color))
print(endTime - startTime)
if portion_filled >= userdef_image_recognized_as_filled:
    print("The image is filled.")
else:
    print("The image is empty.")

# is it possible to use pandas to get the information faster?
# I will have to examine lots of scanned pages

# could this be done with pandas?
_, counts = np.unique(pixels, axis=0, return_counts=True)
# here I try to get unique numbers (count of pixels with the same rgb)
df = pd.DataFrame(pixels, columns=["r", "g", "b"])
count_pixels_same_RGB = df.groupby(["r", "g", "b"]).nunique()
Thank you for your patient help!!
Reply
#10
You are mistaken about pandas and numpy. Pandas uses numpy, so at best pandas can almost be as fast as numpy. Usually pandas is much slower. If pandas is doing something faster, it just means you are not using the right numpy functions. In a recent post about why is pandas running so much slower in python 3.11 than it was in python 3.9 (Issue is related to pandas versions, not python), I made a numpy solution that runs 5000 times faster than the pandas solution in the post. Speed is more about HOW you do things that what tools you are using.

It takes 3 seconds for my primary_color_ratio(image) to process a a 1920 x 1080 image. Over a minute to process 1 second of film. Yuck!

I found it takes almost a second to get the pixels from the image using getdata(), so you should go back to using "pixels = np.array(image).reshape(-1, 3)" which takes less than 0.01 seconds.

Thinking that unique is taking a long time because we are looking for unique arrays, I converted the RGB arrays to a 24bit color.
from PIL import Image
import numpy as np
from time import time


def primary_color_ratio(filename):
    """Return ratio of pixels that are the "background" color."""
    image = Image.open(filename)
    rgb = np.array(image).reshape(-1, 3)
    b24 = rgb[:, 0] * 65535 + rgb[:, 1] * 256 + rgb[:, 2]
    _, counts = np.unique(b24, return_counts=True)
    return max(counts) / max(len(b24), 1)


start = time()
print(primary_color_ratio("test.jpg"))
print(time() - start)
Output:
0.1281328125 0.23400020599365234
It takes twice as long if I replace numpy.unique with pandas.groupby.

6 seconds to process 1 second of film if you are doing it in HD. That's not terrible. You could probably do better if you wrote this in C. You wouldn't have to process the entire frame to decide if it was blank.

That's an interesting idea. Instead of processing every pixel, process one in 100 pixels or 1 in 4 rows or something like that. This uses every 100'th pixel.
from PIL import Image
import numpy as np
from time import time


def primary_color_ratio(filename):
    """Return ratio of pixels that are the "background" color."""
    image = Image.open(filename)
    rgb = np.array(image).reshape(-1, 3)[::100]
    b24 = rgb[:, 0] * 65535 + rgb[:, 1] * 256 + rgb[:, 2]
    _, counts = np.unique(b24, return_counts=True)
    return max(counts) / max(len(b24), 1)


start = time()
print(primary_color_ratio("test.jpg"))
print(time() - start)
Output:
0.1255425347222222 0.05700945854187012
The result is slightly different, but it is also 5x faster. 1.4 seconds to process 1 second of film. The time to read the image and convert to pixels begins to be significant, and further reducing the number of pixels provides smaller and smaller speed gains. For example, processing every 1000'th pixel takes 0.055 seconds.

Because none of the frames depend on other frames, processing multiple frames in parallel would be a simple task.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  guys please help me , pycharm is not able to identify my xlsx file CrazyGreenYT7 1 2,026 Jun-13-2021, 02:22 PM
Last Post: Larz60+
  Need to identify only files created today. tester_V 5 4,665 Feb-18-2021, 06:32 AM
Last Post: tester_V
  pillow reversing the order of pixels after every row johnEmScott 4 3,135 May-27-2020, 09:42 AM
Last Post: scidam
  Need to identify sheet color in excel workbook chewy1418 2 2,537 Feb-14-2020, 03:26 PM
Last Post: chewy1418
  Convert 400 grayscale pixels into RGB python420 1 2,466 Jan-02-2020, 04:19 PM
Last Post: Clunk_Head
  Need help to identify Mersenne Primes, I do need a search pattern. Pleiades 0 1,937 Dec-03-2019, 11:05 PM
Last Post: Pleiades
  White spaces kdiba 1 1,981 Oct-08-2019, 06:52 PM
Last Post: Aurthor_King_of_the_Brittons
  including the white space parts in str.split() Skaperen 6 3,314 Jun-20-2019, 06:03 PM
Last Post: Skaperen
  replace white space with a string, is this pythonic? Skaperen 1 2,022 Jun-18-2019, 11:36 PM
Last Post: metulburr
  Syntax Error : I can't identify what's wrong! caarsonr 11 6,311 Jun-10-2019, 11:18 PM
Last Post: Yoriz

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020