Python Forum
Count image's colors very fast
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Count image's colors very fast
#1
Dear community,

I found a code at stackoverflow which shall get the image's color count very fast.
https://stackoverflow.com/questions/7139...n-an-image

I'm trying to get the second function working:

def count_colors_2(cv_img: np.array) -> list: # no need to give colors

The situation is the following:

I've got a pdf-file ("t2.pdf"), which I convert to the bmp-file "bmpImage.bmp" (line 23).

Then I open the image with openCV2 (line 25).

I don't know why "colors_count_list" is NoneType (line 14).

Here is my attempt:

import time
import numpy as np
from PIL import Image
from pdf2image import convert_from_path
import cv2

colors_count_list = []


def count_colors_2(cv_image: np.array) -> list:  # no need to give colors
    pil_image = Image.fromarray(cv_image)
    colors_count_list = pil_image.getcolors()
    print('count_colors time elapsed: {:.10f}s'.format(time.time() - start_time))
    for count, c_bgr in colors_count_list:
        print('\tcolor {} appeared {} times'.format(c_bgr, count))
    return colors_count_list


if __name__ == '__main__':
    start_time = time.time()
    # save pdf to bmp
    pages = convert_from_path("t2.pdf", 300)
    pages[0].save("bmpImage.bmp", "BMP")
    # Open image using openCV2
    opencv_image = cv2.imread("bmpImage.bmp")
    colors_count_list = count_colors_2(opencv_image)
    print(colors_count_list)
Error:
Traceback (most recent call last): File "D:\Daten\aktuell\testOpenCVColorCount\main.py", line 26, in <module> colors_count_list = count_colors_2(opencv_image) File "D:\Daten\aktuell\testOpenCVColorCount\main.py", line 14, in count_colors_2 for count, c_bgr in colors_count_list: TypeError: 'NoneType' object is not iterable Process finished with exit code 1
Please be so kind and help me...

Many thanks...
Reply
#2
From the PIL Image documentation

Quote:Image.getcolors(maxcolors=256)[source]
Returns a list of colors used in this image.

The colors will be in the image’s mode. For example, an RGB image will return a tuple of (red, green, blue) color values, and a P image will return the index of the color in the palette.

PARAMETERS:
maxcolors – Maximum number of colors. If this number is exceeded, this method returns None. The default limit is 256 colors.
Your image must have more than 256 colors. Worked fine when I passed an image with 5 colors.
Reply
#3
Your code doesn't work for me either!

Just using image, you can get what you want:

from PIL import Image

img = '/home/pedro/Pictures/demeter2.jpeg' # multi-coloured harvest scene
img2 = '/home/pedro/Pictures/Greek-flag.jpg' # blue and white

im = Image.open(img2).convert("L") 
im1 = Image.Image.getcolors(im) # gives output
im = Image.open(img2).convert("RGB") 
im1 = Image.Image.getcolors(im) # no output
im = Image.open(img2).convert("CMYK") 
im1 = Image.Image.getcolors(im) # no output
im = Image.open(img2).convert("P") 
im1 = Image.Image.getcolors(im) # gives different output to "L"
im = Image.open(img).convert("P")
im1 = Image.Image.getcolors(im)
The last output for img:

Output:
[(123, 0), (2542, 11), (3208, 12), (1034, 13), (331, 14), (1, 15), (14, 16), (1989, 17), (7476, 18), (3983, 19), (2996, 20), (165, 21), (16, 23), (80, 24), (356, 25), (279, 26), (25, 27), (1, 31), (41, 46), (1000, 47), (2177, 48), (2679, 49), (534, 50), (2, 51), (57, 52), (2015, 53), (10738, 54), (19666, 55), (23753, 56), (3704, 57), (68, 59), (2636, 60), (20338, 61), (65101, 62), (14642, 63), (15, 66), (2996, 67), (31908, 68), (11880, 69), (1, 73), (242, 74), (265, 75), (3, 82), (3, 83), (4, 84), (19, 88), (153, 89), (253, 90), (926, 91), (1338, 92), (298, 93), (72, 95), (355, 96), (8437, 97), (36334, 98), (11585, 99), (1, 101), (13, 102), (12112, 103), (119076, 104), (106189, 105), (217, 109), (28158, 110), (64745, 111), (23, 116), (215, 117), (5, 125), (1, 127), (1, 130), (22, 131), (32, 132), (19, 133), (21, 134), (2, 135), (2, 137), (21, 138), (362, 139), (13002, 140), (16431, 141), (55, 145), (30329, 146), (84806, 147), (1285, 152), (11728, 153), (6, 174), (7, 175), (4, 176), (2, 181), (732, 182), (4137, 183), (350, 188), (4514, 189), (3, 217), (1, 218), (1, 219), (1, 224), (12, 225)]
len(im1)
97
If you just do this, you get nothing:

im = Image.open(img)
im1 = Image.Image.getcolors(im)
im1
What the parameter maxcolors=256 is supposed to do I don't know. I tried with much bigger numbers and got nothing.

Why .convert("P") is needed is also a mystery to me!
Reply
#4
Dear deanhystad, dear Pedroski55,

thanks a lot for your answers!

In the meantime I read the information about maxcolors too...

Because I have to analyze pictures with lots of colors I will go back to the solution in thread "identify not white pixels in bmp", post #18.

I will have to analyze scanned pages (DIN A4) to find empty pages and will use multiprocessing later.

Is there a way to lower the process time beside of multiprocessing?

Many thanks...
Reply
#5
My list of the best ways to improve speed in order of their impact.
Efficient algorithm. (I've seen thousands times faster results from a better algorithm).
Minimize amount of Python code (Use external libraries. Up to hundreds of times faster than all code written in Python).
Multi-processing (Typically 1.5 to 3 times faster if you use 2 to 4 cores),
Reply
#6
To count the number of colours in an image, which can be as high a 16 million, I believe, use imagemagick from the command line for a quick result.

This image, img3, is quite big and has a lot of colours, but less than 250 000:

From the command line, bash shell:

Quote:identify -format %k /home/pedro/Downloads/damage_back_left_edge.jpg

The above command returns 243704:

Quote:pedro@pedro-HP:~$ identify -format %k /home/pedro/Downloads/damage_back_left_edge.jpg
243704pedro@pedro-HP:~$
Set that number in PIL, or say 250 000:

img3 = '/home/pedro/Downloads/damage_back_left_edge.jpg'
im = Image.open(img3).convert("RGB")
im1 = Image.Image.getcolors(im, maxcolors=250000)
len(im1)
Output:
243704
Reply
#7
Hi deanhystad,
hi Pedroski55,

thanks a lot for your answers!

I experimented a bit...

There is the function on which deanhystad helped me a lot ("primary_color_ratio()") - it takes 1,634 seconds to run.

There is the function which I wrote ("pdf_to_image_array()") - it takes 1,736 seconds to run.

There is the function which I found online ("count_colors_2()") - it takes 0,012 seconds to run.

Could you please give me an advice what I should do?

My goal is:
Convert pdfs to bmps, detect empty pages in a (very) short time.
When this works, it should be optimized with multiprocessing (I got very good help within this forum at this topic already).

import numpy as np
from pdf2image import convert_from_path
from PIL import Image
import time


def primary_color_ratio(pdf_name):
    """Return ratio of pixels that are the "background" color."""
    pages = convert_from_path(pdf_name, 300)
    # save pdf to bmp
    bmpImage = pages[0].save("bmpImage.bmp", "BMP")
    # open Image
    img = Image.open(r"bmpImage.bmp")
    # reducing colors
    image_reduced = img.quantize(colors=2, method=None, kmeans=0, palette=None)
    #image_reduced.show()
    # convert to rgb
    image_rgb = image_reduced.convert('RGB')
    # I will do it later: examine just every xth pixel
    rgb = np.array(image_rgb).reshape(-1, 3)#[::x]
    # get 24Bit Color
    b24 = rgb[:, 0] * 65536 + rgb[:, 1] * 256 + rgb[:, 2]
    _, numberOfColors = np.unique(b24, return_counts=True)
    return max(numberOfColors) / max(len(b24), 1)


def pdf_to_image_array(pdf_name):
    pages = convert_from_path(pdf_name, 300)
    # save pdf to bmp
    bmpImage = pages[0].save("bmpImage.bmp", "BMP")
    # open Image
    img = Image.open("bmpImage.bmp")
    # reducing colors
    image_reduced = img.quantize(colors=2, method=None, kmeans=0, palette=None)
    image_array = np.array(image_reduced)
    return image_array


def count_colors_2(image_array) -> list:  # no need to give colors
    pil_image = Image.fromarray(image_array)
    colors_count_list = pil_image.getcolors(2)
    for count, c_bgr in colors_count_list:
        print('\tcolor {} appeared {} times'.format(c_bgr, count))
    return colors_count_list


pdf_name = "t2.pdf"
image_array = pdf_to_image_array(pdf_name)
start_time = time.time()
count_colors_2(image_array)
print('count_colors_2 time elapsed: {:.10f}s'.format(time.time() - start_time))
Reply
#8
I wouldn't expect pil_image.getcolors(2) to take very long to find 3 colors and return None.
Reply
#9
code source, someone has always done these things before!

I didn't get the part that you only want to find blank pages. Sorry.

If a page with no text is "a blank page" (could only contain an image I suppose) then this will save all that messing around with pixels!

import fitz

# check whether the page has text or not.
def check_page(page):
    text = page.get_text()
    return len(text.strip()) == 0

path2infile = "/home/pedro/pdfs/pdfs/doctor_visits_with_blank_pages.pdf" # 5 pages, 2 pages no text
path2outfile = "/home/pedro/pdfs/pdfs/doctor_visits_no_blank_pages.pdf" # ends up with 3 pages

input_pdf = fitz.open(path2infile)
output_pdf = fitz.open()

for pgno in range(input_pdf.page_count):
  page = input_pdf[pgno]
  if not check_page(page):
    output_pdf.insert_pdf(input_pdf,from_page=pgno,to_page = pgno)

output_pdf.save(path2outfile)
input_pdf.close()
output_pdf.close()
You can add another function to check for images, if no text is found!

But, if all pages are numbered, that is text!
Reply
#10
Doh!

I my defense, this did start as a previous thread titled: identify not white pixels in bmp. The pdf came later. Still, Doh!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  can openpyxl read font colors mperemsky 3 1,753 May-09-2023, 11:18 AM
Last Post: MindKeeper
  ANSI not working for change of text colors BliepMonster 10 3,413 Nov-10-2022, 09:28 AM
Last Post: BliepMonster
  Row Count and coloumn count Yegor123 4 1,336 Oct-18-2022, 03:52 AM
Last Post: Yegor123
  How to do bar graph with positive and negative values different colors? Mark17 1 5,168 Jun-10-2022, 07:38 PM
Last Post: Mark17
  Plot Back Ground Colors JoeDainton123 0 2,197 Aug-19-2020, 11:09 PM
Last Post: JoeDainton123
  How to fill between the same area with two different colors Staph 0 1,494 Jul-08-2020, 07:01 PM
Last Post: Staph
  How do I map a list of values to specified colors? larkypython 4 2,573 Nov-05-2019, 09:22 AM
Last Post: larkypython
  after using openpyxl to add colors to script, black shows up white online in excel Soundtechscott 1 3,697 Jun-08-2019, 10:33 PM
Last Post: Soundtechscott
  How to plot two list on the same graph with different colors? Alberto 2 28,816 Jul-18-2017, 09:20 AM
Last Post: Alberto
  How to print in colors JohnNo 2 3,183 Apr-01-2017, 02:14 PM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020