![]() |
Count image's colors very fast - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Count image's colors very fast (/thread-41547.html) Pages:
1
2
|
Count image's colors very fast - flash77 - Feb-03-2024 Dear community, I found a code at stackoverflow which shall get the image's color count very fast. https://stackoverflow.com/questions/71399313/count-pixel-color-in-an-image I'm trying to get the second function working: def count_colors_2(cv_img: np.array) -> list: # no need to give colors The situation is the following: I've got a pdf-file ("t2.pdf"), which I convert to the bmp-file "bmpImage.bmp" (line 23). Then I open the image with openCV2 (line 25). I don't know why "colors_count_list" is NoneType (line 14). Here is my attempt: import time import numpy as np from PIL import Image from pdf2image import convert_from_path import cv2 colors_count_list = [] def count_colors_2(cv_image: np.array) -> list: # no need to give colors pil_image = Image.fromarray(cv_image) colors_count_list = pil_image.getcolors() print('count_colors time elapsed: {:.10f}s'.format(time.time() - start_time)) for count, c_bgr in colors_count_list: print('\tcolor {} appeared {} times'.format(c_bgr, count)) return colors_count_list if __name__ == '__main__': start_time = time.time() # save pdf to bmp pages = convert_from_path("t2.pdf", 300) pages[0].save("bmpImage.bmp", "BMP") # Open image using openCV2 opencv_image = cv2.imread("bmpImage.bmp") colors_count_list = count_colors_2(opencv_image) print(colors_count_list) Please be so kind and help me...Many thanks... RE: Count image's colors very fast - deanhystad - Feb-03-2024 From the PIL Image documentation Quote:Image.getcolors(maxcolors=256)[source]Your image must have more than 256 colors. Worked fine when I passed an image with 5 colors. RE: Count image's colors very fast - Pedroski55 - Feb-04-2024 Your code doesn't work for me either! Just using image, you can get what you want: from PIL import Image img = '/home/pedro/Pictures/demeter2.jpeg' # multi-coloured harvest scene img2 = '/home/pedro/Pictures/Greek-flag.jpg' # blue and white im = Image.open(img2).convert("L") im1 = Image.Image.getcolors(im) # gives output im = Image.open(img2).convert("RGB") im1 = Image.Image.getcolors(im) # no output im = Image.open(img2).convert("CMYK") im1 = Image.Image.getcolors(im) # no output im = Image.open(img2).convert("P") im1 = Image.Image.getcolors(im) # gives different output to "L" im = Image.open(img).convert("P") im1 = Image.Image.getcolors(im)The last output for img:
len(im1) 97If you just do this, you get nothing: im = Image.open(img) im1 = Image.Image.getcolors(im) im1What the parameter maxcolors=256 is supposed to do I don't know. I tried with much bigger numbers and got nothing. Why .convert("P") is needed is also a mystery to me! RE: Count image's colors very fast - flash77 - Feb-04-2024 Dear deanhystad, dear Pedroski55, thanks a lot for your answers! In the meantime I read the information about maxcolors too... Because I have to analyze pictures with lots of colors I will go back to the solution in thread "identify not white pixels in bmp", post #18. I will have to analyze scanned pages (DIN A4) to find empty pages and will use multiprocessing later. Is there a way to lower the process time beside of multiprocessing? Many thanks... RE: Count image's colors very fast - deanhystad - Feb-04-2024 My list of the best ways to improve speed in order of their impact. Efficient algorithm. (I've seen thousands times faster results from a better algorithm). Minimize amount of Python code (Use external libraries. Up to hundreds of times faster than all code written in Python). Multi-processing (Typically 1.5 to 3 times faster if you use 2 to 4 cores), RE: Count image's colors very fast - Pedroski55 - Feb-05-2024 To count the number of colours in an image, which can be as high a 16 million, I believe, use imagemagick from the command line for a quick result. This image, img3, is quite big and has a lot of colours, but less than 250 000: From the command line, bash shell: Quote:identify -format %k /home/pedro/Downloads/damage_back_left_edge.jpg The above command returns 243704: Quote:pedro@pedro-HP:~$ identify -format %k /home/pedro/Downloads/damage_back_left_edge.jpgSet that number in PIL, or say 250 000: img3 = '/home/pedro/Downloads/damage_back_left_edge.jpg' im = Image.open(img3).convert("RGB") im1 = Image.Image.getcolors(im, maxcolors=250000) len(im1)
RE: Count image's colors very fast - flash77 - Feb-05-2024 Hi deanhystad, hi Pedroski55, thanks a lot for your answers! I experimented a bit... There is the function on which deanhystad helped me a lot ("primary_color_ratio()") - it takes 1,634 seconds to run. There is the function which I wrote ("pdf_to_image_array()") - it takes 1,736 seconds to run. There is the function which I found online ("count_colors_2()") - it takes 0,012 seconds to run. Could you please give me an advice what I should do? My goal is: Convert pdfs to bmps, detect empty pages in a (very) short time. When this works, it should be optimized with multiprocessing (I got very good help within this forum at this topic already). import numpy as np from pdf2image import convert_from_path from PIL import Image import time def primary_color_ratio(pdf_name): """Return ratio of pixels that are the "background" color.""" pages = convert_from_path(pdf_name, 300) # save pdf to bmp bmpImage = pages[0].save("bmpImage.bmp", "BMP") # open Image img = Image.open(r"bmpImage.bmp") # reducing colors image_reduced = img.quantize(colors=2, method=None, kmeans=0, palette=None) #image_reduced.show() # convert to rgb image_rgb = image_reduced.convert('RGB') # I will do it later: examine just every xth pixel rgb = np.array(image_rgb).reshape(-1, 3)#[::x] # get 24Bit Color b24 = rgb[:, 0] * 65536 + rgb[:, 1] * 256 + rgb[:, 2] _, numberOfColors = np.unique(b24, return_counts=True) return max(numberOfColors) / max(len(b24), 1) def pdf_to_image_array(pdf_name): pages = convert_from_path(pdf_name, 300) # save pdf to bmp bmpImage = pages[0].save("bmpImage.bmp", "BMP") # open Image img = Image.open("bmpImage.bmp") # reducing colors image_reduced = img.quantize(colors=2, method=None, kmeans=0, palette=None) image_array = np.array(image_reduced) return image_array def count_colors_2(image_array) -> list: # no need to give colors pil_image = Image.fromarray(image_array) colors_count_list = pil_image.getcolors(2) for count, c_bgr in colors_count_list: print('\tcolor {} appeared {} times'.format(c_bgr, count)) return colors_count_list pdf_name = "t2.pdf" image_array = pdf_to_image_array(pdf_name) start_time = time.time() count_colors_2(image_array) print('count_colors_2 time elapsed: {:.10f}s'.format(time.time() - start_time)) RE: Count image's colors very fast - deanhystad - Feb-05-2024 I wouldn't expect pil_image.getcolors(2) to take very long to find 3 colors and return None. RE: Count image's colors very fast - Pedroski55 - Feb-06-2024 code source, someone has always done these things before! I didn't get the part that you only want to find blank pages. Sorry. If a page with no text is "a blank page" (could only contain an image I suppose) then this will save all that messing around with pixels! import fitz # check whether the page has text or not. def check_page(page): text = page.get_text() return len(text.strip()) == 0 path2infile = "/home/pedro/pdfs/pdfs/doctor_visits_with_blank_pages.pdf" # 5 pages, 2 pages no text path2outfile = "/home/pedro/pdfs/pdfs/doctor_visits_no_blank_pages.pdf" # ends up with 3 pages input_pdf = fitz.open(path2infile) output_pdf = fitz.open() for pgno in range(input_pdf.page_count): page = input_pdf[pgno] if not check_page(page): output_pdf.insert_pdf(input_pdf,from_page=pgno,to_page = pgno) output_pdf.save(path2outfile) input_pdf.close() output_pdf.close()You can add another function to check for images, if no text is found! But, if all pages are numbered, that is text! RE: Count image's colors very fast - deanhystad - Feb-06-2024 Doh! I my defense, this did start as a previous thread titled: identify not white pixels in bmp. The pdf came later. Still, Doh! |