Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
raw byte of integer
#1
Hello!
I am brand new, so bear with me. I am trying to do a specific polynomial of crc32 in py and that seems to be working but its horribly slow (c++ does 2 million of these crc32s + an md5 in 0.4 or less seconds, java did it in 1.5 sec, and python is taking 6-7 sec for just 1M crcs!!). My testing indicates that the shift (<< 24) operator is the main issue, and I tried division (it was no good) and am now looking at ctypes (I am proficient at c & c++) to see if I can just directly access the offending byte.

I want to take a ctypes pointer to a 4 byte integer and get the third byte (as unsigned byte).
I have been messing with the cast function and the pointers for a couple of hours now and can't seem to nail down the syntax (in spite of similar but not quite what I want examples online).

i = c_uint(12345678)
cast(i, POINTER(c_byte*4)())

Can anyone direct me, or am I going down the wrong approach, and if so, is there a fast way to do x >> 24 (and similar) in python? I know I can call my C code but we don't use it at my company, no one much knows it and its marked avoid at all costs (annoying, but that is how it is). I am not expecting the 0.4 but would love to get cut it on down to at least java's speeds.
Reply
#2
Still trying to get anywhere with this. Why am I getting @ symbol here?
x = 123456
y = x.to_bytes(4, byteorder='big')
print (y)
y = struct.pack("I",x)
print (y)
output:
Output:
b'\x00\x01\xe2@' b'@\xe2\x01\x00'
Reply
#3
When printing byte-strings Python leaves ascii symbols as is.
Convert it to char-codes, e.g.
list(y)
Output:
[64, 226, 1, 0]
So, '@' -> '64' (ascii code of the @-symbol).
You can access third byte, e.g. using y[2].
Reply
#4
Thanks!
That worked but it was even slower after I finally got it all in place. I tried upshifting by multiply by 256 (vs << 8) and capping the result back to 32 bits but it choked on that too.
I found a tool that will do it but they somehow made it even slower: crccheck 21 seconds / million
so far my only win was to cut out calls to 'ord(char)' which somehow takes all day to covert a letter to its numeric format (which it already was in internally, ?!).
I feel like I am missing something really dumb here. Is there an enable optimize flag when you run or something that I left off?
target work will have to do 70M (each calling crc 2x) at times and need to be virtually instant for 1 at a time requests too. Its probably ok on the 1x1s from a realistic look but the bulk runs will be like a 1980s overnight job at this rate (this is a small part of the process).
Reply
#5
I knocked another second off by condensing all the temporary variables into a one liner.
Anyone see anything else at all to do here? The unsigned macro just caps it back to 32 bits after shifting up (cpu shift would discard bits that go off the end, python grows them).
crctab is just the polynomial as a lookup table of 256 32 bit y=f(x) results.
UNSIGNED = lambda n: n & 0xffffffff    
def crc32p(b):
    crc = 0
    for c in b:                
        crc = (UNSIGNED(crc << 8) ^ (crctab[( (crc>>24) ^ c )]))
    return crc^0xFFFFFFFF  #some crc need a final xor, mine does. 
4.3 sec / M.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  'utf-8' codec can't decode byte 0xe2 in position 122031: invalid continuation byte tienttt 12 11,347 Sep-18-2020, 10:10 PM
Last Post: tienttt
  'utf-8' codec can't decode byte 0xda in position 184: invalid continuation byte karkas 8 31,469 Feb-08-2020, 06:58 PM
Last Post: karkas
  4 byte hex byte swap from binary file medievil 7 21,917 May-08-2018, 08:16 AM
Last Post: killerrex

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020