Python Forum
binary data in source code form - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: binary data in source code form (/thread-3426.html)



binary data in source code form - Skaperen - May-23-2017

i have some big binary data i need to embed in a python source file with the intention that it will be written to a binary file.  one way to do this is to store it as a string literal with backslash encoding.  is there a more efficient way to do it?  would big integer literals be more efficient?  assume the binary data is already compressed and cannot be further compressed.  would base64 be better?  base85?


RE: binary data in source code form - Ofnuts - May-23-2017

If it's binary, don' t put it in a string. Use a byte array.


RE: binary data in source code form - Larz60+ - May-23-2017

this is Doug Hellmann's (author Python Standard Library) blog on binary nmanipulations: https://pymotw.com/3/struct/


RE: binary data in source code form - nilamo - Jun-01-2017

Base64 is how binary images are stored in source fairly frequently. Not sure if that helps you or not.


RE: binary data in source code form - Skaperen - Jun-02-2017

(May-23-2017, 05:49 AM)Ofnuts Wrote: If it's binary, don' t put it in a string. Use a byte array.

that's really what i mean.  it's still written like a string.  what i am wondering is if ints could be a more efficient way to express it.  my goal is to make the file that looks like python source code as small as possible.

data = b'\000\377\377\000\000\377\377\003\000\177\177\005\006\377\377\001'
vs.

data = ( 72056494543077123, 35886981611257601 )
vs.

data = 1329207713684792564064364514848538369
it looks like int would be more efficient.

(Jun-01-2017, 09:51 PM)nilamo Wrote: Base64 is how binary images are stored in source fairly frequently.  Not sure if that helps you or not.
that is one of the first ways i looked at. but it is rather inefficient. it's better than hexadecimal, but only by 50% more. in terms of the convenience of available tools for conversion, it is a good choice. but look at how you would embed it in source code. you would have a big string of the base64 characters and the calls to convert that string. that would take up a lot of source code space.