Python Forum
4 byte hex byte swap from binary file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: 4 byte hex byte swap from binary file (/thread-9963.html)



4 byte hex byte swap from binary file - medievil - May-07-2018

can't seem to find a solution that works.
the program I am working on basically takes a bunch of specially formatted texture files and dumps them into one large file with a special header (And subheaders)
issue I am running across seems really simply, but the solution is anything but.
I need to read a 4 byte location (easy enough)to get data size, SO I have to read that location then byte swap it .. IE read aa bb cc dd and turn it into dd cc bb aa so I can use the file size as an offset before the next texture. Everything I try seems to turn it into ascii equivalent and doesn't use the actual hex data. I don't understand why using the raw data is so difficult. I have done exhaustive searches, tried array.array, struct.pack/unpack while changing the endian..
anyone got an easy method?


RE: 4 byte hex byte swap from binary file - killerrex - May-07-2018

What you are describing is easy to do with the bytes type:
# This can be the result from
# with open('file.bin', 'rb') as fd:
#     a = fd.read(4)
>>> a = b'\01\02\03\04' 
>>> a[::-1]
b'\x04\x03\x02\x01'
If your problem is that the numbers in the binary file are coded in a different endian as your machine, you can use the from_bytes of the int class:
>>> int.from_bytes(a, 'big')
16909060
>>> hex(int.from_bytes(a, 'big'))
'0x1020304'
>>> int.from_bytes(a, 'little')
67305985
>>> hex(int.from_bytes(a, 'little'))
'0x4030201'
The bytes type behaves almost as a string or a list, and is really practical to deal with binary files. You might want to look at the struct module.


RE: 4 byte hex byte swap from binary file - DeaD_EyE - May-07-2018

If you're reading/processing binary data, you should take a look into struct.

In addition you can use third party modules, if your task is more complex.
https://pypi.org/project/bitarray/

There are similar modules like this. I haven't tried them yet.


RE: 4 byte hex byte swap from binary file - medievil - May-07-2018

(May-07-2018, 08:21 AM)killerrex Wrote: What you are describing is easy to do with the bytes type:
# This can be the result from
# with open('file.bin', 'rb') as fd:
#     a = fd.read(4)
>>> a = b'\01\02\03\04' 
>>> a[::-1]
b'\x04\x03\x02\x01'
If your problem is that the numbers in the binary file are coded in a different endian as your machine, you can use the from_bytes of the int class:
>>> int.from_bytes(a, 'big')
16909060
>>> hex(int.from_bytes(a, 'big'))
'0x1020304'
>>> int.from_bytes(a, 'little')
67305985
>>> hex(int.from_bytes(a, 'little'))
'0x4030201'
The bytes type behaves almost as a string or a list, and is really practical to deal with binary files. You might want to look at the struct module.
possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04' 
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close
as a test case... bytes read are 00 80 00 00
bytes written back were 01 02 03 04
(May-07-2018, 04:19 PM)DeaD_EyE Wrote: If you're reading/processing binary data, you should take a look into struct.

In addition you can use third party modules, if your task is more complex.
https://pypi.org/project/bitarray/

There are similar modules like this. I haven't tried them yet.
tried using struct, though maybe I am missing something, it returns the ascii value and not the data value... from the above example my 00 80 00 00 turns into 30's with a 38 thrown in


RE: 4 byte hex byte swap from binary file - DeaD_EyE - May-07-2018

It's good to know which datatype it should be.
Which format is the texture?

In [8]: struct.unpack('<i', b'\x00\x80\x00\x00')   # < is little endian
Out[8]: (32768,)
In [9]: struct.unpack('>i', b'\x00\x80\x00\x00')   # > is big endian, i is 
Out[9]: (8388608,)
You open the file in binary mode, seek the current position, then you read the data.

def read_size(file):
    """
    This function returns the size of a texture in bytes.
    """
    with open(file, 'rb') as fd:
        fd.seek(1337) # go to the right offset, 1337 is just an example
        size = fd.read(4) # read the size
    return struct.unpack('<i', size)[0]



RE: 4 byte hex byte swap from binary file - killerrex - May-07-2018

(May-07-2018, 04:40 PM)medievil Wrote: possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04' 
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close

You are right for the to_bytes part, it does not exist in python2... I hope there is a good reason to work with python2, otherwise for a new development I strongly recommend to use python3.

About your code, be careful, you are not testing what you think. Try with:
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
f.close()
print('I read from file ' + str(a))

a = b'\01\02\03\04' 
# Important! The inversion is not in.place!
b = a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(b)
w.close()



RE: 4 byte hex byte swap from binary file - medievil - May-08-2018

(May-07-2018, 08:34 PM)DeaD_EyE Wrote: It's good to know which datatype it should be.
Which format is the texture?

In [8]: struct.unpack('<i', b'\x00\x80\x00\x00')   # < is little endian
Out[8]: (32768,)
In [9]: struct.unpack('>i', b'\x00\x80\x00\x00')   # > is big endian, i is 
Out[9]: (8388608,)
You open the file in binary mode, seek the current position, then you read the data.

def read_size(file):
    """
    This function returns the size of a texture in bytes.
    """
    with open(file, 'rb') as fd:
        fd.seek(1337) # go to the right offset, 1337 is just an example
        size = fd.read(4) # read the size
    return struct.unpack('<i', size)[0]
they are dxt1 and 5 dds...BUT the headers are none standard...


[inline]
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 73 6D 63 61 6E 79 6F 6E 32 00 00 00 00 00 00 00 smcanyon2.......
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 80 00 00 08 00 00 00 41 00 00 00 01 00 00 00 .€......A.......
[/inline]
thats the header before data start @ 00000040

(May-07-2018, 09:53 PM)killerrex Wrote:
(May-07-2018, 04:40 PM)medievil Wrote: possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04' 
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close

You are right for the to_bytes part, it does not exist in python2... I hope there is a good reason to work with python2, otherwise for a new development I strongly recommend to use python3.

About your code, be careful, you are not testing what you think. Try with:
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
f.close()
print('I read from file ' + str(a))

a = b'\01\02\03\04' 
# Important! The inversion is not in.place!
b = a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(b)
w.close()
I read from file has Ç
test file has 04 03 02 01 instead of 00 00 80 00 (inverse of 00 80 00 00 which was read)

I am on 2.7 for learning purposes... moving to 3 later will be much easier if I learn 2.7 really well
new to all of it... except for some 8 bit assembly back in the day and lots and lots of Basic back in high school..lol


RE: 4 byte hex byte swap from binary file - killerrex - May-08-2018

(May-08-2018, 12:53 AM)medievil Wrote: I am on 2.7 for learning purposes... moving to 3 later will be much easier if I learn 2.7 really well
new to all of it... except for some 8 bit assembly back in the day and lots and lots of Basic back in high school..lol

In reality is exactly the opposite. Python 2.7 is full of quirks and tricks accumulated during years with many traps that you need to learn the hard way. In python 3 things are much uniform and organised around well defined concepts so learning is much easier.
In special, working with binary files is much easier in python3 because the bytes type is at its own a row of values between 0 and 255 and not mixed with the strings (especially relevant the first time you work with something out of the ASCII table)
And remember that python 2.7 is receiving just bug fixes and will be out of maintenance in less than 2 years...

I written again your example so it works both in python2 and 3
#!/usr/bin/env python3

# The contents of source.bin are the 1st 64 bytes:
# ~> hexdump -C source.bin 
# 00000000  00 01 02 03 04 05 06 07  08 09 0a 0b 0c 0d 0e 0f  |................|
# 00000010  10 11 12 13 14 15 16 17  18 19 1a 1b 1c 1d 1e 1f  |................|
# 00000020  20 21 22 23 24 25 26 27  28 29 2a 2b 2c 2d 2e 2f  | !"#$%&'()*+,-./|
# 00000030  30 31 32 33 34 35 36 37  38 39 3a 3b 3c 3d 3e 3f  |0123456789:;<=>?|
with open("source.bin", 'rb') as fd:
    # At byte 4 is 04 05 06 07
    fd.seek(4)
    a = fd.read(4)

print('I read from file {!r}'.format(a))

a = b'\01\02\03\04' 
# Important! The inversion is not inplace!
b = a[::-1]

with open("output.bin", 'w+b') as fd:
    fd.write(b)
# And the result:
# ~> hexdump -C output.bin
# 00000000  04 03 02 01                                       |....|
I have also created a small example using struct to read the header you post. Obviously I have no idea of the real field kinds, so I am asuming that it is a 16 bytes string (null terminated as in C) + 8 signed 32 bit integers + 2 floats + 1 double:
with open('demo.bin', 'rb') as fd:
    raw = fd.read()
print(raw)

# From here the raw *MUST* measure only 64 bytes, or struct will complain
# as the input bytes must match exactly the pattern... split your input wisely
# As big endian
big = struct.unpack('>16s8i4f', raw)
print('Big:', big)

# As little endian:
little = struct.unpack('<16s8i4f', raw)
print('Little:', little)
Output:
b'smcanyon2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x00\x00\x08\x00\x00\x00A\x00\x00\x00\x01\x00\x00\x00' Big: (b'smcanyon2\x00\x00\x00\x00\x00\x00\x00', 0, 0, 0, 0, 65536, 65536, 0, 0, 1.1754943508222875e-38, 3.851859888774472e-34, 131072.00048828125) Little: (b'smcanyon2\x00\x00\x00\x00\x00\x00\x00', 0, 0, 0, 0, 256, 256, 0, 0, 4.591774807899561e-41, 1.1210387714598537e-44, 2.121995823e-314)