Posts: 148
Threads: 34
Joined: May 2020
Dear community,
I found a code at stackoverflow which shall get the image's color count very fast.
https://stackoverflow.com/questions/7139...n-an-image
I'm trying to get the second function working:
def count_colors_2(cv_img: np.array) -> list: # no need to give colors
The situation is the following:
I've got a pdf-file ("t2.pdf"), which I convert to the bmp-file "bmpImage.bmp" (line 23).
Then I open the image with openCV2 (line 25).
I don't know why "colors_count_list" is NoneType (line 14).
Here is my attempt:
import time
import numpy as np
from PIL import Image
from pdf2image import convert_from_path
import cv2
colors_count_list = []
def count_colors_2(cv_image: np.array) -> list: # no need to give colors
pil_image = Image.fromarray(cv_image)
colors_count_list = pil_image.getcolors()
print('count_colors time elapsed: {:.10f}s'.format(time.time() - start_time))
for count, c_bgr in colors_count_list:
print('\tcolor {} appeared {} times'.format(c_bgr, count))
return colors_count_list
if __name__ == '__main__':
start_time = time.time()
# save pdf to bmp
pages = convert_from_path("t2.pdf", 300)
pages[0].save("bmpImage.bmp", "BMP")
# Open image using openCV2
opencv_image = cv2.imread("bmpImage.bmp")
colors_count_list = count_colors_2(opencv_image)
print(colors_count_list) Error: Traceback (most recent call last):
File "D:\Daten\aktuell\testOpenCVColorCount\main.py", line 26, in <module>
colors_count_list = count_colors_2(opencv_image)
File "D:\Daten\aktuell\testOpenCVColorCount\main.py", line 14, in count_colors_2
for count, c_bgr in colors_count_list:
TypeError: 'NoneType' object is not iterable
Process finished with exit code 1
Please be so kind and help me...
Many thanks...
Posts: 6,777
Threads: 20
Joined: Feb 2020
Feb-03-2024, 08:25 PM
(This post was last modified: Feb-03-2024, 08:25 PM by deanhystad.)
From the PIL Image documentation
Quote:Image.getcolors(maxcolors=256)[source]
Returns a list of colors used in this image.
The colors will be in the image’s mode. For example, an RGB image will return a tuple of (red, green, blue) color values, and a P image will return the index of the color in the palette.
PARAMETERS:
maxcolors – Maximum number of colors. If this number is exceeded, this method returns None. The default limit is 256 colors.
Your image must have more than 256 colors. Worked fine when I passed an image with 5 colors.
Posts: 1,088
Threads: 143
Joined: Jul 2017
Your code doesn't work for me either!
Just using image, you can get what you want:
from PIL import Image
img = '/home/pedro/Pictures/demeter2.jpeg' # multi-coloured harvest scene
img2 = '/home/pedro/Pictures/Greek-flag.jpg' # blue and white
im = Image.open(img2).convert("L")
im1 = Image.Image.getcolors(im) # gives output
im = Image.open(img2).convert("RGB")
im1 = Image.Image.getcolors(im) # no output
im = Image.open(img2).convert("CMYK")
im1 = Image.Image.getcolors(im) # no output
im = Image.open(img2).convert("P")
im1 = Image.Image.getcolors(im) # gives different output to "L"
im = Image.open(img).convert("P")
im1 = Image.Image.getcolors(im) The last output for img:
Output: [(123, 0), (2542, 11), (3208, 12), (1034, 13), (331, 14), (1, 15), (14, 16), (1989, 17), (7476, 18), (3983, 19), (2996, 20), (165, 21), (16, 23), (80, 24), (356, 25), (279, 26), (25, 27), (1, 31), (41, 46), (1000, 47), (2177, 48), (2679, 49), (534, 50), (2, 51), (57, 52), (2015, 53), (10738, 54), (19666, 55), (23753, 56), (3704, 57), (68, 59), (2636, 60), (20338, 61), (65101, 62), (14642, 63), (15, 66), (2996, 67), (31908, 68), (11880, 69), (1, 73), (242, 74), (265, 75), (3, 82), (3, 83), (4, 84), (19, 88), (153, 89), (253, 90), (926, 91), (1338, 92), (298, 93), (72, 95), (355, 96), (8437, 97), (36334, 98), (11585, 99), (1, 101), (13, 102), (12112, 103), (119076, 104), (106189, 105), (217, 109), (28158, 110), (64745, 111), (23, 116), (215, 117), (5, 125), (1, 127), (1, 130), (22, 131), (32, 132), (19, 133), (21, 134), (2, 135), (2, 137), (21, 138), (362, 139), (13002, 140), (16431, 141), (55, 145), (30329, 146), (84806, 147), (1285, 152), (11728, 153), (6, 174), (7, 175), (4, 176), (2, 181), (732, 182), (4137, 183), (350, 188), (4514, 189), (3, 217), (1, 218), (1, 219), (1, 224), (12, 225)]
len(im1)
97 If you just do this, you get nothing:
im = Image.open(img)
im1 = Image.Image.getcolors(im)
im1 What the parameter maxcolors=256 is supposed to do I don't know. I tried with much bigger numbers and got nothing.
Why .convert("P") is needed is also a mystery to me!
Posts: 148
Threads: 34
Joined: May 2020
Dear deanhystad, dear Pedroski55,
thanks a lot for your answers!
In the meantime I read the information about maxcolors too...
Because I have to analyze pictures with lots of colors I will go back to the solution in thread "identify not white pixels in bmp", post #18.
I will have to analyze scanned pages (DIN A4) to find empty pages and will use multiprocessing later.
Is there a way to lower the process time beside of multiprocessing?
Many thanks...
Posts: 6,777
Threads: 20
Joined: Feb 2020
My list of the best ways to improve speed in order of their impact.
Efficient algorithm. (I've seen thousands times faster results from a better algorithm).
Minimize amount of Python code (Use external libraries. Up to hundreds of times faster than all code written in Python).
Multi-processing (Typically 1.5 to 3 times faster if you use 2 to 4 cores),
Posts: 1,088
Threads: 143
Joined: Jul 2017
To count the number of colours in an image, which can be as high a 16 million, I believe, use imagemagick from the command line for a quick result.
This image, img3, is quite big and has a lot of colours, but less than 250 000:
From the command line, bash shell:
Quote:identify -format %k /home/pedro/Downloads/damage_back_left_edge.jpg
The above command returns 243704:
Quote:pedro@pedro-HP:~$ identify -format %k /home/pedro/Downloads/damage_back_left_edge.jpg
243704pedro@pedro-HP:~$
Set that number in PIL, or say 250 000:
img3 = '/home/pedro/Downloads/damage_back_left_edge.jpg'
im = Image.open(img3).convert("RGB")
im1 = Image.Image.getcolors(im, maxcolors=250000)
len(im1) Output: 243704
Posts: 148
Threads: 34
Joined: May 2020
Hi deanhystad,
hi Pedroski55,
thanks a lot for your answers!
I experimented a bit...
There is the function on which deanhystad helped me a lot ("primary_color_ratio()") - it takes 1,634 seconds to run.
There is the function which I wrote ("pdf_to_image_array()") - it takes 1,736 seconds to run.
There is the function which I found online ("count_colors_2()") - it takes 0,012 seconds to run.
Could you please give me an advice what I should do?
My goal is:
Convert pdfs to bmps, detect empty pages in a (very) short time.
When this works, it should be optimized with multiprocessing (I got very good help within this forum at this topic already).
import numpy as np
from pdf2image import convert_from_path
from PIL import Image
import time
def primary_color_ratio(pdf_name):
"""Return ratio of pixels that are the "background" color."""
pages = convert_from_path(pdf_name, 300)
# save pdf to bmp
bmpImage = pages[0].save("bmpImage.bmp", "BMP")
# open Image
img = Image.open(r"bmpImage.bmp")
# reducing colors
image_reduced = img.quantize(colors=2, method=None, kmeans=0, palette=None)
#image_reduced.show()
# convert to rgb
image_rgb = image_reduced.convert('RGB')
# I will do it later: examine just every xth pixel
rgb = np.array(image_rgb).reshape(-1, 3)#[::x]
# get 24Bit Color
b24 = rgb[:, 0] * 65536 + rgb[:, 1] * 256 + rgb[:, 2]
_, numberOfColors = np.unique(b24, return_counts=True)
return max(numberOfColors) / max(len(b24), 1)
def pdf_to_image_array(pdf_name):
pages = convert_from_path(pdf_name, 300)
# save pdf to bmp
bmpImage = pages[0].save("bmpImage.bmp", "BMP")
# open Image
img = Image.open("bmpImage.bmp")
# reducing colors
image_reduced = img.quantize(colors=2, method=None, kmeans=0, palette=None)
image_array = np.array(image_reduced)
return image_array
def count_colors_2(image_array) -> list: # no need to give colors
pil_image = Image.fromarray(image_array)
colors_count_list = pil_image.getcolors(2)
for count, c_bgr in colors_count_list:
print('\tcolor {} appeared {} times'.format(c_bgr, count))
return colors_count_list
pdf_name = "t2.pdf"
image_array = pdf_to_image_array(pdf_name)
start_time = time.time()
count_colors_2(image_array)
print('count_colors_2 time elapsed: {:.10f}s'.format(time.time() - start_time))
Posts: 6,777
Threads: 20
Joined: Feb 2020
I wouldn't expect pil_image.getcolors(2) to take very long to find 3 colors and return None.
Posts: 1,088
Threads: 143
Joined: Jul 2017
Feb-06-2024, 01:38 PM
(This post was last modified: Feb-06-2024, 01:38 PM by Pedroski55.)
code source, someone has always done these things before!
I didn't get the part that you only want to find blank pages. Sorry.
If a page with no text is "a blank page" (could only contain an image I suppose) then this will save all that messing around with pixels!
import fitz
# check whether the page has text or not.
def check_page(page):
text = page.get_text()
return len(text.strip()) == 0
path2infile = "/home/pedro/pdfs/pdfs/doctor_visits_with_blank_pages.pdf" # 5 pages, 2 pages no text
path2outfile = "/home/pedro/pdfs/pdfs/doctor_visits_no_blank_pages.pdf" # ends up with 3 pages
input_pdf = fitz.open(path2infile)
output_pdf = fitz.open()
for pgno in range(input_pdf.page_count):
page = input_pdf[pgno]
if not check_page(page):
output_pdf.insert_pdf(input_pdf,from_page=pgno,to_page = pgno)
output_pdf.save(path2outfile)
input_pdf.close()
output_pdf.close() You can add another function to check for images, if no text is found!
But, if all pages are numbered, that is text!
Posts: 6,777
Threads: 20
Joined: Feb 2020
Doh!
I my defense, this did start as a previous thread titled: identify not white pixels in bmp. The pdf came later. Still, Doh!
|