Aug-16-2019, 07:32 AM
I'm noticing something odd with byte string catenation versus text string.
Im building a bmp file. The core code is (ignoring row padding and headers etc)
This was tested on a 1024 x 1024 pixel image, and the elapsed time was logged every 100 rows.
Overall, Python 2.7 runs in 2.4 seconds, Python 3.7 runs in over 8 minutes (i7-8700 CPU @ 3.2GHz, 16 GB RAM)
3.7 is orders of magnitude slower, and there is a major slow down when the string gets around the 1 MB mark as illustrated by these graphs of elapsed time every 100 rows (approx 300k of data added to the string):
Graphs of elapsed time bytes vs text
Ive tried googling this and havent found much, which surprised me - Id have thought this would have been a major issue. Apart from the obvious slowness of the byte strings, the pattern is very striking. What happens at the 1MB mark? Obviously a change of algorithm of some sort.
Does anyone have any advice? I'm wondering about converting the struct.pack() output to char string (cf Python 2.7) , catenate that and then convert back to byte string when it is complete. Or would that conversion back be super slow?
Im building a bmp file. The core code is (ignoring row padding and headers etc)
for Y in range(0, XY[1]): # (BMPs are L to R from the bottom L row) for X in range(0, XY[0]): # square rainbow as oposed to real data x = floor((255 * X)/XY[0]) y = floor((255 * Y)/XY[1]) (r,g,b) = [x, y, 128] #Colour(data[x ,y]) pixels += struct.pack('<BBB',b,g,r)and Ive run the code in Python 2.7 (char strings) and in Python 3.7 (byte strings)
This was tested on a 1024 x 1024 pixel image, and the elapsed time was logged every 100 rows.
Overall, Python 2.7 runs in 2.4 seconds, Python 3.7 runs in over 8 minutes (i7-8700 CPU @ 3.2GHz, 16 GB RAM)
3.7 is orders of magnitude slower, and there is a major slow down when the string gets around the 1 MB mark as illustrated by these graphs of elapsed time every 100 rows (approx 300k of data added to the string):
Graphs of elapsed time bytes vs text
Ive tried googling this and havent found much, which surprised me - Id have thought this would have been a major issue. Apart from the obvious slowness of the byte strings, the pattern is very striking. What happens at the 1MB mark? Obviously a change of algorithm of some sort.
Does anyone have any advice? I'm wondering about converting the struct.pack() output to char string (cf Python 2.7) , catenate that and then convert back to byte string when it is complete. Or would that conversion back be super slow?