Oct-05-2023, 07:08 PM
(This post was last modified: Oct-05-2023, 07:12 PM by deanhystad.)
You are mistaken about pandas and numpy. Pandas uses numpy, so at best pandas can almost be as fast as numpy. Usually pandas is much slower. If pandas is doing something faster, it just means you are not using the right numpy functions. In a recent post about why is pandas running so much slower in python 3.11 than it was in python 3.9 (Issue is related to pandas versions, not python), I made a numpy solution that runs 5000 times faster than the pandas solution in the post. Speed is more about HOW you do things that what tools you are using.
It takes 3 seconds for my primary_color_ratio(image) to process a a 1920 x 1080 image. Over a minute to process 1 second of film. Yuck!
I found it takes almost a second to get the pixels from the image using getdata(), so you should go back to using "pixels = np.array(image).reshape(-1, 3)" which takes less than 0.01 seconds.
Thinking that unique is taking a long time because we are looking for unique arrays, I converted the RGB arrays to a 24bit color.
6 seconds to process 1 second of film if you are doing it in HD. That's not terrible. You could probably do better if you wrote this in C. You wouldn't have to process the entire frame to decide if it was blank.
That's an interesting idea. Instead of processing every pixel, process one in 100 pixels or 1 in 4 rows or something like that. This uses every 100'th pixel.
Because none of the frames depend on other frames, processing multiple frames in parallel would be a simple task.
It takes 3 seconds for my primary_color_ratio(image) to process a a 1920 x 1080 image. Over a minute to process 1 second of film. Yuck!
I found it takes almost a second to get the pixels from the image using getdata(), so you should go back to using "pixels = np.array(image).reshape(-1, 3)" which takes less than 0.01 seconds.
Thinking that unique is taking a long time because we are looking for unique arrays, I converted the RGB arrays to a 24bit color.
from PIL import Image import numpy as np from time import time def primary_color_ratio(filename): """Return ratio of pixels that are the "background" color.""" image = Image.open(filename) rgb = np.array(image).reshape(-1, 3) b24 = rgb[:, 0] * 65535 + rgb[:, 1] * 256 + rgb[:, 2] _, counts = np.unique(b24, return_counts=True) return max(counts) / max(len(b24), 1) start = time() print(primary_color_ratio("test.jpg")) print(time() - start)
Output:0.1281328125
0.23400020599365234
It takes twice as long if I replace numpy.unique with pandas.groupby.6 seconds to process 1 second of film if you are doing it in HD. That's not terrible. You could probably do better if you wrote this in C. You wouldn't have to process the entire frame to decide if it was blank.
That's an interesting idea. Instead of processing every pixel, process one in 100 pixels or 1 in 4 rows or something like that. This uses every 100'th pixel.
from PIL import Image import numpy as np from time import time def primary_color_ratio(filename): """Return ratio of pixels that are the "background" color.""" image = Image.open(filename) rgb = np.array(image).reshape(-1, 3)[::100] b24 = rgb[:, 0] * 65535 + rgb[:, 1] * 256 + rgb[:, 2] _, counts = np.unique(b24, return_counts=True) return max(counts) / max(len(b24), 1) start = time() print(primary_color_ratio("test.jpg")) print(time() - start)
Output:0.1255425347222222
0.05700945854187012
The result is slightly different, but it is also 5x faster. 1.4 seconds to process 1 second of film. The time to read the image and convert to pixels begins to be significant, and further reducing the number of pixels provides smaller and smaller speed gains. For example, processing every 1000'th pixel takes 0.055 seconds.Because none of the frames depend on other frames, processing multiple frames in parallel would be a simple task.