Aug-11-2018, 01:40 AM
i am using lookup tables to define the logic to convert a sequence of octets in UTF-8 form to a Unicode code point. there are actually two tables. the index is in range(256) as well as the value, so does it make sense to use a bytearray? when doing a lookup, the index is always an int, and the looked up value is used like an int. this fits the model of bytearray very well. somewhere i read that bytearray was stored as contiguous bytes somewhere in memory, which should make the lookup indexing very fast. is this true?
here is my logic to build the tables:
here is my logic to build the tables:
num=bytearray([0])*256 bit=bytearray([0])*256 ctl=((0,128,1,255), (128,192,0,0), (192,224,2,31), (224,240,3,15), (240,248,4,7), (248,252,5.3), (252,254,6,1), (254,255,7,0), (255,256,8,0), ) for a,b,c,d in ctl: for o in range(a,b): num[o]=c bit[o]=o&d