4 byte hex byte swap from binary file

medievil · (This post was last modified: May-07-2018, 05:01 AM by buran.)

can't seem to find a solution that works.
the program I am working on basically takes a bunch of specially formatted texture files and dumps them into one large file with a special header (And subheaders)
issue I am running across seems really simply, but the solution is anything but.
I need to read a 4 byte location (easy enough)to get data size, SO I have to read that location then byte swap it .. IE read aa bb cc dd and turn it into dd cc bb aa so I can use the file size as an offset before the next texture. Everything I try seems to turn it into ascii equivalent and doesn't use the actual hex data. I don't understand why using the raw data is so difficult. I have done exhaustive searches, tried array.array, struct.pack/unpack while changing the endian..
anyone got an easy method?

killerrex · May-07-2018, 08:21 AM

What you are describing is easy to do with the bytes type:

# This can be the result from
# with open('file.bin', 'rb') as fd:
#     a = fd.read(4)
>>> a = b'\01\02\03\04' 
>>> a[::-1]
b'\x04\x03\x02\x01'

If your problem is that the numbers in the binary file are coded in a different endian as your machine, you can use the from_bytes of the int class:

>>> int.from_bytes(a, 'big')
16909060
>>> hex(int.from_bytes(a, 'big'))
'0x1020304'
>>> int.from_bytes(a, 'little')
67305985
>>> hex(int.from_bytes(a, 'little'))
'0x4030201'

The bytes type behaves almost as a string or a list, and is really practical to deal with binary files. You might want to look at the struct module.

DeaD_EyE · (This post was last modified: May-07-2018, 04:20 PM by DeaD_EyE.)

If you're reading/processing binary data, you should take a look into struct.

In addition you can use third party modules, if your task is more complex.
https://pypi.org/project/bitarray/

There are similar modules like this. I haven't tried them yet.

medievil · May-07-2018, 04:40 PM

(May-07-2018, 08:21 AM)killerrex Wrote: What you are describing is easy to do with the bytes type:
# This can be the result from
# with open('file.bin', 'rb') as fd:
#     a = fd.read(4)
>>> a = b'\01\02\03\04' 
>>> a[::-1]
b'\x04\x03\x02\x01'
If your problem is that the numbers in the binary file are coded in a different endian as your machine, you can use the from_bytes of the int class:
>>> int.from_bytes(a, 'big')
16909060
>>> hex(int.from_bytes(a, 'big'))
'0x1020304'
>>> int.from_bytes(a, 'little')
67305985
>>> hex(int.from_bytes(a, 'little'))
'0x4030201'
The bytes type behaves almost as a string or a list, and is really practical to deal with binary files. You might want to look at the struct module.

possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used

import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04' 
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close

as a test case... bytes read are 00 80 00 00
bytes written back were 01 02 03 04

(May-07-2018, 04:19 PM)DeaD_EyE Wrote: If you're reading/processing binary data, you should take a look into struct.

In addition you can use third party modules, if your task is more complex.
https://pypi.org/project/bitarray/

There are similar modules like this. I haven't tried them yet.

tried using struct, though maybe I am missing something, it returns the ascii value and not the data value... from the above example my 00 80 00 00 turns into 30's with a 38 thrown in

DeaD_EyE · May-07-2018, 08:34 PM

It's good to know which datatype it should be.
Which format is the texture?

In [8]: struct.unpack('<i', b'\x00\x80\x00\x00')   # < is little endian
Out[8]: (32768,)
In [9]: struct.unpack('>i', b'\x00\x80\x00\x00')   # > is big endian, i is 
Out[9]: (8388608,)

You open the file in binary mode, seek the current position, then you read the data.

def read_size(file):
    """
    This function returns the size of a texture in bytes.
    """
    with open(file, 'rb') as fd:
        fd.seek(1337) # go to the right offset, 1337 is just an example
        size = fd.read(4) # read the size
    return struct.unpack('<i', size)[0]

killerrex · May-07-2018, 09:53 PM

(May-07-2018, 04:40 PM)medievil Wrote: possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04' 
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close

You are right for the to_bytes part, it does not exist in python2... I hope there is a good reason to work with python2, otherwise for a new development I strongly recommend to use python3.

About your code, be careful, you are not testing what you think. Try with:

import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
f.close()
print('I read from file ' + str(a))

a = b'\01\02\03\04' 
# Important! The inversion is not in.place!
b = a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(b)
w.close()

medievil · (This post was last modified: May-08-2018, 01:29 AM by medievil.)

(May-07-2018, 08:34 PM)DeaD_EyE Wrote: It's good to know which datatype it should be.
Which format is the texture?

In [8]: struct.unpack('<i', b'\x00\x80\x00\x00')   # < is little endian
Out[8]: (32768,)
In [9]: struct.unpack('>i', b'\x00\x80\x00\x00')   # > is big endian, i is 
Out[9]: (8388608,)

You open the file in binary mode, seek the current position, then you read the data.

def read_size(file):
    """
    This function returns the size of a texture in bytes.
    """
    with open(file, 'rb') as fd:
        fd.seek(1337) # go to the right offset, 1337 is just an example
        size = fd.read(4) # read the size
    return struct.unpack('<i', size)[0]

they are dxt1 and 5 dds...BUT the headers are none standard...

[inline]
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 73 6D 63 61 6E 79 6F 6E 32 00 00 00 00 00 00 00 smcanyon2.......
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 80 00 00 08 00 00 00 41 00 00 00 01 00 00 00 .€......A.......
[/inline]
thats the header before data start @ 00000040

(May-07-2018, 09:53 PM)killerrex Wrote:
(May-07-2018, 04:40 PM)medievil Wrote: possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04' 
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close
You are right for the to_bytes part, it does not exist in python2... I hope there is a good reason to work with python2, otherwise for a new development I strongly recommend to use python3.

About your code, be careful, you are not testing what you think. Try with:
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
f.close()
print('I read from file ' + str(a))

a = b'\01\02\03\04' 
# Important! The inversion is not in.place!
b = a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(b)
w.close()

I read from file has Ç
test file has 04 03 02 01 instead of 00 00 80 00 (inverse of 00 80 00 00 which was read)

I am on 2.7 for learning purposes... moving to 3 later will be much easier if I learn 2.7 really well
new to all of it... except for some 8 bit assembly back in the day and lots and lots of Basic back in high school..lol

killerrex · (This post was last modified: May-08-2018, 08:16 AM by killerrex.)

(May-08-2018, 12:53 AM)medievil Wrote: I am on 2.7 for learning purposes... moving to 3 later will be much easier if I learn 2.7 really well
new to all of it... except for some 8 bit assembly back in the day and lots and lots of Basic back in high school..lol

In reality is exactly the opposite. Python 2.7 is full of quirks and tricks accumulated during years with many traps that you need to learn the hard way. In python 3 things are much uniform and organised around well defined concepts so learning is much easier.
In special, working with binary files is much easier in python3 because the bytes type is at its own a row of values between 0 and 255 and not mixed with the strings (especially relevant the first time you work with something out of the ASCII table)
And remember that python 2.7 is receiving just bug fixes and will be out of maintenance in less than 2 years...

I written again your example so it works both in python2 and 3

#!/usr/bin/env python3

# The contents of source.bin are the 1st 64 bytes:
# ~> hexdump -C source.bin 
# 00000000  00 01 02 03 04 05 06 07  08 09 0a 0b 0c 0d 0e 0f  |................|
# 00000010  10 11 12 13 14 15 16 17  18 19 1a 1b 1c 1d 1e 1f  |................|
# 00000020  20 21 22 23 24 25 26 27  28 29 2a 2b 2c 2d 2e 2f  | !"#$%&'()*+,-./|
# 00000030  30 31 32 33 34 35 36 37  38 39 3a 3b 3c 3d 3e 3f  |0123456789:;<=>?|
with open("source.bin", 'rb') as fd:
    # At byte 4 is 04 05 06 07
    fd.seek(4)
    a = fd.read(4)

print('I read from file {!r}'.format(a))

a = b'\01\02\03\04' 
# Important! The inversion is not inplace!
b = a[::-1]

with open("output.bin", 'w+b') as fd:
    fd.write(b)
# And the result:
# ~> hexdump -C output.bin
# 00000000  04 03 02 01                                       |....|

I have also created a small example using struct to read the header you post. Obviously I have no idea of the real field kinds, so I am asuming that it is a 16 bytes string (null terminated as in C) + 8 signed 32 bit integers + 2 floats + 1 double:

with open('demo.bin', 'rb') as fd:
    raw = fd.read()
print(raw)

# From here the raw *MUST* measure only 64 bytes, or struct will complain
# as the input bytes must match exactly the pattern... split your input wisely
# As big endian
big = struct.unpack('>16s8i4f', raw)
print('Big:', big)

# As little endian:
little = struct.unpack('<16s8i4f', raw)
print('Little:', little)

Output:b'smcanyon2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x00\x00\x08\x00\x00\x00A\x00\x00\x00\x01\x00\x00\x00'
Big: (b'smcanyon2\x00\x00\x00\x00\x00\x00\x00', 0, 0, 0, 0, 65536, 65536, 0, 0, 1.1754943508222875e-38, 3.851859888774472e-34, 131072.00048828125)
Little: (b'smcanyon2\x00\x00\x00\x00\x00\x00\x00', 0, 0, 0, 0, 256, 256, 0, 0, 4.591774807899561e-41, 1.1210387714598537e-44, 2.121995823e-314)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Search for multiple unknown 3 (2) Byte combinations in a file.	lastyle	7	1,373	Aug-14-2023, 02:28 AM Last Post: deanhystad
	How do I read and write a binary file in Python?	blackears	6	6,671	Jun-06-2023, 06:37 PM Last Post: rajeshgk
	UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 16: invalid cont	Melcu54	3	5,007	Mar-26-2023, 12:12 PM Last Post: Gribouillis
	extract only text strip byte array	Pir8Radio	7	2,985	Nov-29-2022, 10:24 PM Last Post: Pir8Radio
	sending byte in code?	korenron	2	1,133	Oct-30-2022, 01:14 PM Last Post: korenron
	UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character	Melcu54	7	19,030	Sep-26-2022, 10:09 AM Last Post: Melcu54
	Byte Error when working with APIs	Oshadha	2	1,020	Jul-05-2022, 05:23 AM Last Post: deanhystad
	How to swap two numbers in fields in python	Joni_Engr	5	1,872	Jan-11-2022, 09:43 AM Last Post: menator01
	calculate data using 1 byte checksum	korenron	2	2,965	Nov-23-2021, 07:17 AM Last Post: korenron
	Hashing an address for binary file	Python_help	8	2,645	Nov-04-2021, 06:23 AM Last Post: ndc85430

4 byte hex byte swap from binary file

User Panel Messages

Announcements