Python Forum
Byte string catenation inefficient in 3.7?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Byte string catenation inefficient in 3.7?
#11
I managed to get a similar performance in python 3 by using a temporary list to store a row of pixels. It seems that the problem indeed comes from the repeated concatenation of a single pixel to a bytes string. Here is my faster code.
# Testing struct.pack and string catenation in Python2 and 3
# This is a demo cut down from real app (which draws charts from survey data)
# creates a 'square rainbow' bmp file
## edit for Py 2 (char strings) or 3 (byte strings) versions
 
# edit these for your set up and test
Size = 1024    # test image size, pixels
# path = 'D:/Python37/MyScripts/Test/'  # for the bmp file
 
import csv
import os
import struct
from math import trunc, ceil, floor
import time

path = os.path.join(os.path.dirname(__file__), 'test', '')
 
def BuildImage(name, XY):
    # name : filename
    # XY : (width, height) pixels
 
    # for stats and timing
    n0 = 0
    t00 = time.clock()
    t01 = t00
     
    chtName = path+'cht_'+name+'3.bmp'
    print("drawing "+chtName)
     
    hdr = bmpHdr(XY)
    #print(hdr)
     
    ##pixels =''  ##  Py2
    pixels = bytes('', 'utf-8')  ## Py3
     
    for Y in range(0, XY[1]): # (BMPs are L to R from the bottom L row)
        temp = []
        for X in range(0, XY[0]):
            # square rainbow for time tests -  as oposed to real data
            x = floor((255 * X)/XY[0])
            y = floor((255 * Y)/XY[1])
            (r,g,b) = [x, y, 128]   #Colour(data[x ,y])
            temp.append(struct.pack('<BBB',b,g,r))
        pixels += b''.join(temp)
             
        row_mod = (hdr['width']*hdr['colordepth']/8) % 4
        if row_mod == 0:
            padding = 0 
        else:
            padding = (4 - row_mod)
        ##padbytes = ''  #  P2
        padbytes = bytes('', 'utf-8')  # P3
        for i in range(padding):
            padbytes += struct.pack('<B',0)
        pixels = pixels + padbytes
 
        # stats log
        if(0 == Y % 100 or Y == 0):
            n = len(pixels)
            t02 = time.clock()
            log = "{0:5d} L={1:8,d}, delta={2:7,d}, pad={3:4d}".format(XY[0]-Y, n, n-n0, padding)
            log += ", time = {0:6.3f}, cum = {1:7.3f}".format(t02-t01, t02-t00)
            print(log)
            t01 = t02
            n0 = n
     
    print("pixels generated, len = "+str(len(pixels)))
    bmp_write(chtName, hdr, pixels)
     
 
def bmpHdr(XY):
    print("bmphdr xy "+str(XY))
    hdr = {
        'mn1':66,
        'mn2':77,
        'filesize':0,
        'undef1':0,
        'undef2':0,
        'offset':54,
        'headerlength':40,
        'width':XY[0],   #256
        'height':XY[1],  #256
        'colorplanes':0,
        'colordepth':24,
        'compression':0,
        'imagesize':0,
        'res_hor':0,
        'res_vert':0,
        'palette':0,
        'importantcolors':0
        }
    return hdr
 
 
#Function to write a bmp file.  It takes a dictionary (hdr) of
#header values and the pixel data (pixels) and writes them
#to a file.  This function is called at the bottom of the code.
def bmp_write(name, hdr, pixels):
    print('making bmp with '+str(len(pixels))+" pixels")
    mn1 = struct.pack('<B',hdr['mn1'])
    mn2 = struct.pack('<B',hdr['mn2'])
    filesize = struct.pack('<L',hdr['filesize'])
    undef1 = struct.pack('<H',hdr['undef1'])
    undef2 = struct.pack('<H',hdr['undef2'])
    offset = struct.pack('<L',hdr['offset'])
    headerlength = struct.pack('<L',hdr['headerlength'])
    width = struct.pack('<L',hdr['width'])
    height = struct.pack('<L',hdr['height'])
    colorplanes = struct.pack('<H',hdr['colorplanes'])
    colordepth = struct.pack('<H',hdr['colordepth'])
    compression = struct.pack('<L',hdr['compression'])
    imagesize = struct.pack('<L',hdr['imagesize'])
    res_hor = struct.pack('<L',hdr['res_hor'])
    res_vert = struct.pack('<L',hdr['res_vert'])
    palette = struct.pack('<L',hdr['palette'])
    importantcolors = struct.pack('<L',hdr['importantcolors'])
    #create the outfile
    outfile = open(name,'wb')   # 'bitmap_image.bmp'
    #write the header + the_bytes
    hdr = mn1+mn2
    hdr += filesize+undef1+undef2
    hdr += offset+headerlength+width+height
    hdr += colorplanes+colordepth+compression+imagesize+res_hor+res_vert
    hdr += palette+importantcolors
    print("headers = "+str(hdr))
    bmp = hdr + pixels
    print('writing bmp, len = '+str(len(bmp)))
    outfile.write(bmp)
 
###################################    
def main():
 
    time0 = time.clock()
    print("start {0}x{0} bmp file @ {1:.3f}".format(Size, time0))
 
    # set the size of the bmp image here
    BuildImage("test", (Size,Size))
    time1 = time.clock()
    print("Chart complete, run time {0:.3f} secs".format(time1-time0))
     
 
if __name__ == '__main__':
    main()
Reply
#12
Whereas the same (catenation) algorithm using char strings is also very much faster and roughly linear.
It looks like the bytes catenation switches to a very poor algorithm when the string size gets around the 1 MB Mark.

I like the Numpy 2-D array approach conceptually though there are a couple of hoops to jump through to get from Numpy int array to bytes vector.

Quote:indeed comes from the repeated concatenation of a single pixel to a bytes string. Here is my faster code.

... to an ever increasing bytes string....

I’m am curious as to the difference between char and bytes strings in this regard.
Reply
#13
I've found this reference. It seems that you could replace bytes with bytearray for better performance. The problem is also known to David Beazley, see here.
Reply
#14
(Aug-16-2019, 08:52 PM)Gribouillis Wrote: I managed to get a similar performance in python 3 by using a temporary list to store a row of pixels. It seems that the problem indeed comes from the repeated concatenation of a single pixel to a bytes string. Here is my faster code.
# Testing struct.pack and string catenation in Python2 and 3
# This is a demo cut down from real app (which draws charts from survey data)
# creates a 'square rainbow' bmp file
## edit for Py 2 (char strings) or 3 (byte strings) versions
 
# edit these for your set up and test
Size = 1024    # test image size, pixels
# path = 'D:/Python37/MyScripts/Test/'  # for the bmp file
 
import csv
import os
import struct
from math import trunc, ceil, floor
import time

path = os.path.join(os.path.dirname(__file__), 'test', '')
 
def BuildImage(name, XY):
    # name : filename
    # XY : (width, height) pixels
 
    # for stats and timing
    n0 = 0
    t00 = time.clock()
    t01 = t00
     
    chtName = path+'cht_'+name+'3.bmp'
    print("drawing "+chtName)
     
    hdr = bmpHdr(XY)
    #print(hdr)
     
    ##pixels =''  ##  Py2
    pixels = bytes('', 'utf-8')  ## Py3
     
    for Y in range(0, XY[1]): # (BMPs are L to R from the bottom L row)
        temp = []
        for X in range(0, XY[0]):
            # square rainbow for time tests -  as oposed to real data
            x = floor((255 * X)/XY[0])
            y = floor((255 * Y)/XY[1])
            (r,g,b) = [x, y, 128]   #Colour(data[x ,y])
            temp.append(struct.pack('<BBB',b,g,r))
        pixels += b''.join(temp)
             
        row_mod = (hdr['width']*hdr['colordepth']/8) % 4
        if row_mod == 0:
            padding = 0 
        else:
            padding = (4 - row_mod)
        ##padbytes = ''  #  P2
        padbytes = bytes('', 'utf-8')  # P3
        for i in range(padding):
            padbytes += struct.pack('<B',0)
        pixels = pixels + padbytes
 
        # stats log
        if(0 == Y % 100 or Y == 0):
            n = len(pixels)
            t02 = time.clock()
            log = "{0:5d} L={1:8,d}, delta={2:7,d}, pad={3:4d}".format(XY[0]-Y, n, n-n0, padding)
            log += ", time = {0:6.3f}, cum = {1:7.3f}".format(t02-t01, t02-t00)
            print(log)
            t01 = t02
            n0 = n
     
    print("pixels generated, len = "+str(len(pixels)))
    bmp_write(chtName, hdr, pixels)
     
 
def bmpHdr(XY):
    print("bmphdr xy "+str(XY))
    hdr = {
        'mn1':66,
        'mn2':77,
        'filesize':0,
        'undef1':0,
        'undef2':0,
        'offset':54,
        'headerlength':40,
        'width':XY[0],   #256
        'height':XY[1],  #256
        'colorplanes':0,
        'colordepth':24,
        'compression':0,
        'imagesize':0,
        'res_hor':0,
        'res_vert':0,
        'palette':0,
        'importantcolors':0
        }
    return hdr
 
 
#Function to write a bmp file.  It takes a dictionary (hdr) of
#header values and the pixel data (pixels) and writes them
#to a file.  This function is called at the bottom of the code.
def bmp_write(name, hdr, pixels):
    print('making bmp with '+str(len(pixels))+" pixels")
    mn1 = struct.pack('<B',hdr['mn1'])
    mn2 = struct.pack('<B',hdr['mn2'])
    filesize = struct.pack('<L',hdr['filesize'])
    undef1 = struct.pack('<H',hdr['undef1'])
    undef2 = struct.pack('<H',hdr['undef2'])
    offset = struct.pack('<L',hdr['offset'])
    headerlength = struct.pack('<L',hdr['headerlength'])
    width = struct.pack('<L',hdr['width'])
    height = struct.pack('<L',hdr['height'])
    colorplanes = struct.pack('<H',hdr['colorplanes'])
    colordepth = struct.pack('<H',hdr['colordepth'])
    compression = struct.pack('<L',hdr['compression'])
    imagesize = struct.pack('<L',hdr['imagesize'])
    res_hor = struct.pack('<L',hdr['res_hor'])
    res_vert = struct.pack('<L',hdr['res_vert'])
    palette = struct.pack('<L',hdr['palette'])
    importantcolors = struct.pack('<L',hdr['importantcolors'])
    #create the outfile
    outfile = open(name,'wb')   # 'bitmap_image.bmp'
    #write the header + the_bytes
    hdr = mn1+mn2
    hdr += filesize+undef1+undef2
    hdr += offset+headerlength+width+height
    hdr += colorplanes+colordepth+compression+imagesize+res_hor+res_vert
    hdr += palette+importantcolors
    print("headers = "+str(hdr))
    bmp = hdr + pixels
    print('writing bmp, len = '+str(len(bmp)))
    outfile.write(bmp)
 
###################################    
def main():
 
    time0 = time.clock()
    print("start {0}x{0} bmp file @ {1:.3f}".format(Size, time0))
 
    # set the size of the bmp image here
    BuildImage("test", (Size,Size))
    time1 = time.clock()
    print("Chart complete, run time {0:.3f} secs".format(time1-time0))
     
 
if __name__ == '__main__':
    main()

Good observation. You get just about the same speed up using a string for the row array (as opposed to your list), the key is working row-wise as it were, I think.

(Aug-17-2019, 07:21 AM)Gribouillis Wrote: I've found this reference. It seems that you could replace bytes with bytearray for better performance. The problem is also known to David Beazley, see here.

What has me confused is that reason given for the poor performance is immutability (bytearrays are mutable).
However bytes and string objects are both immutable, so that in and of itself is not the reason.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  pyreadstat write_sav inefficient mikisDeWitte 2 2,691 Jun-21-2021, 09:49 AM
Last Post: mikisDeWitte
  'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte tienttt 12 11,344 Sep-18-2020, 10:10 PM
Last Post: tienttt
  'utf-8' codec can't decode byte 0xda in position 184: invalid continuation byte karkas 8 31,468 Feb-08-2020, 06:58 PM
Last Post: karkas
  First Byte of a string is missing while receiving data over TCP Socket shahrukh1987 3 4,167 Nov-20-2019, 10:34 AM
Last Post: shahrukh1987
  HELP: String of Zero's and One's to binary byte schwasskin 1 3,829 May-19-2019, 07:31 AM
Last Post: heiner55
  4 byte hex byte swap from binary file medievil 7 21,917 May-08-2018, 08:16 AM
Last Post: killerrex
  get the content of the byte as string ricardons 5 3,626 Mar-02-2018, 02:41 PM
Last Post: ricardons
  byte string Skaperen 5 3,762 Feb-04-2018, 08:58 AM
Last Post: Gribouillis
  byte string in python2 Skaperen 4 4,278 Nov-23-2017, 03:13 AM
Last Post: Skaperen
  Does Python 3.x have a built-in byte string compare function? Raptor88 2 16,298 Feb-18-2017, 10:44 AM
Last Post: Raptor88

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020