Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
lookup tables
#1
i am using lookup tables to define the logic to convert a sequence of octets in UTF-8 form to a Unicode code point. there are actually two tables. the index is in range(256) as well as the value, so does it make sense to use a bytearray? when doing a lookup, the index is always an int, and the looked up value is used like an int. this fits the model of bytearray very well. somewhere i read that bytearray was stored as contiguous bytes somewhere in memory, which should make the lookup indexing very fast. is this true?

here is my logic to build the tables:
num=bytearray([0])*256
bit=bytearray([0])*256
ctl=((0,128,1,255),
     (128,192,0,0),
     (192,224,2,31),
     (224,240,3,15),
     (240,248,4,7),
     (248,252,5.3),
     (252,254,6,1),
     (254,255,7,0),
     (255,256,8,0),
    )
for a,b,c,d in ctl:
    for o in range(a,b):
        num[o]=c
        bit[o]=o&d
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
This entry in the documentation seems to ensure your requirement of contiguous bytes in memory. However, if you're creating read-only tables, why not use the bytes type directly?
num = bytes(c for a, b, c, d in ctl for o in range(a, b))
bit = bytes(o & d  for a, b, c, d in ctl for o in range(a, b))
Reply
#3
i don't require contiguous bytes. but since all the items are < 256, then bytes are usable and they can be contiguous. a contiguous lookup would be faster than, for example, a list of ints, which was in earlier code prototypes. the bytes type does look good in Python3. but in Python2, bytes == str. in some places i split the code based on Python2 vs. Python3, while i try to make most of the code work in both Python2 and Python3. my goal for my UTF-8 code and my Escape Sequence code is to make everything work in 2.7 and 3.x as much as i can.

one thing i am putting some thought into is whether someone might want to get UTF-8 results back in a byte type but had to give Unicode data in a type that supports the full range of Unicode code points (a list of ints in both versions, str in Python3, unicode in Python2). originally i was going to return the same type as given. going from UTF-8 to Unicode is easier to see. even if the UTF-8 is given as bytes, the Unicode result probably can't be, so i will need to return something bigger. going the other way has a different issue. if the Unicode is given as some large type (which the caller usually must do), is that the type they want UTF-8 in? or do they want it in a byte type. so i'm thinking of adding support for a returntype= option to let the caller specify.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
oh, now i see the intent of your suggestion. the reason i have the code this way is to ensure that even if i modify the table ctl the generated data will be 256 in length and the various data will be stored in the right place. so they are initialized to the right length with data that is to be there if data from scanning the ctl table happens to not store anything in some location, or the order is changed.

ctl=(
     (192,224,2,31),
     (224,240,3,15),
     (240,248,4,7),
     (252,254,6,1),
     (254,255,7,0),
     (255,256,8,0),
     (0,128,1,255),
    )
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
I see. It is probably the best way.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  2-dataframe, datetime lookup problem Mark17 0 1,215 Jan-27-2022, 01:02 AM
Last Post: Mark17
  Python VLookup? Lookup Table? Nu2Python 3 2,373 Oct-25-2021, 08:47 PM
Last Post: Nu2Python
  Can I replace IF statements with a key lookup ? jehoshua 3 2,461 Mar-05-2021, 10:24 PM
Last Post: jehoshua
  python 3 dns lookup private domain didact 1 2,501 Sep-19-2020, 06:01 PM
Last Post: bowlofred
  Partial key lookup in dictionary GaryNR 1 3,385 Jul-16-2020, 06:55 PM
Last Post: Gribouillis
  Encoding and mac-vendor-lookup library tuanjggaa 1 2,663 Mar-27-2020, 03:12 PM
Last Post: deanhystad
  Excel Lookup riteshprakash 0 1,742 Sep-11-2019, 12:43 PM
Last Post: riteshprakash
  fast lookup for array markB 3 4,003 May-13-2019, 12:11 AM
Last Post: scidam
  Lookup tables parrytoss 0 2,482 Feb-07-2018, 08:45 AM
Last Post: parrytoss
  Reading specific rows (lookup) rumbles 3 3,304 Jan-03-2018, 04:07 PM
Last Post: hshivaraj

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020