Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 4 byte hex byte swap from binary file
#1
can't seem to find a solution that works.
the program I am working on basically takes a bunch of specially formatted texture files and dumps them into one large file with a special header (And subheaders)
issue I am running across seems really simply, but the solution is anything but.
I need to read a 4 byte location (easy enough)to get data size, SO I have to read that location then byte swap it .. IE read aa bb cc dd and turn it into dd cc bb aa so I can use the file size as an offset before the next texture. Everything I try seems to turn it into ascii equivalent and doesn't use the actual hex data. I don't understand why using the raw data is so difficult. I have done exhaustive searches, tried array.array, struct.pack/unpack while changing the endian..
anyone got an easy method?
buran wrote May-07-2018, 05:01 AM:
Please, use meaningful thread titles that describe your problem.
EDIT: Thank you for editing the title
Quote
#2
What you are describing is easy to do with the bytes type:
# This can be the result from
# with open('file.bin', 'rb') as fd:
#     a = fd.read(4)
>>> a = b'\01\02\03\04' 
>>> a[::-1]
b'\x04\x03\x02\x01'
If your problem is that the numbers in the binary file are coded in a different endian as your machine, you can use the from_bytes of the int class:
>>> int.from_bytes(a, 'big')
16909060
>>> hex(int.from_bytes(a, 'big'))
'0x1020304'
>>> int.from_bytes(a, 'little')
67305985
>>> hex(int.from_bytes(a, 'little'))
'0x4030201'
The bytes type behaves almost as a string or a list, and is really practical to deal with binary files. You might want to look at the struct module.
Quote
#3
If you're reading/processing binary data, you should take a look into struct.

In addition you can use third party modules, if your task is more complex.
https://pypi.org/project/bitarray/

There are similar modules like this. I haven't tried them yet.
My code examples are always for Python >=3.6.0
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Quote
#4
(May-07-2018, 08:21 AM)killerrex Wrote: What you are describing is easy to do with the bytes type:
# This can be the result from
# with open('file.bin', 'rb') as fd:
#     a = fd.read(4)
>>> a = b'\01\02\03\04' 
>>> a[::-1]
b'\x04\x03\x02\x01'
If your problem is that the numbers in the binary file are coded in a different endian as your machine, you can use the from_bytes of the int class:
>>> int.from_bytes(a, 'big')
16909060
>>> hex(int.from_bytes(a, 'big'))
'0x1020304'
>>> int.from_bytes(a, 'little')
67305985
>>> hex(int.from_bytes(a, 'little'))
'0x4030201'
The bytes type behaves almost as a string or a list, and is really practical to deal with binary files. You might want to look at the struct module.
possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04' 
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close
as a test case... bytes read are 00 80 00 00
bytes written back were 01 02 03 04
(May-07-2018, 04:19 PM)DeaD_EyE Wrote: If you're reading/processing binary data, you should take a look into struct.

In addition you can use third party modules, if your task is more complex.
https://pypi.org/project/bitarray/

There are similar modules like this. I haven't tried them yet.
tried using struct, though maybe I am missing something, it returns the ascii value and not the data value... from the above example my 00 80 00 00 turns into 30's with a 38 thrown in
Quote
#5
It's good to know which datatype it should be.
Which format is the texture?

In [8]: struct.unpack('<i', b'\x00\x80\x00\x00')   # < is little endian
Out[8]: (32768,)
In [9]: struct.unpack('>i', b'\x00\x80\x00\x00')   # > is big endian, i is 
Out[9]: (8388608,)
You open the file in binary mode, seek the current position, then you read the data.

def read_size(file):
    """
    This function returns the size of a texture in bytes.
    """
    with open(file, 'rb') as fd:
        fd.seek(1337) # go to the right offset, 1337 is just an example
        size = fd.read(4) # read the size
    return struct.unpack('<i', size)[0]
My code examples are always for Python >=3.6.0
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Quote
#6
(May-07-2018, 04:40 PM)medievil Wrote: possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04' 
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close

You are right for the to_bytes part, it does not exist in python2... I hope there is a good reason to work with python2, otherwise for a new development I strongly recommend to use python3.

About your code, be careful, you are not testing what you think. Try with:
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
f.close()
print('I read from file ' + str(a))

a = b'\01\02\03\04' 
# Important! The inversion is not in.place!
b = a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(b)
w.close()
Quote
#7
(May-07-2018, 08:34 PM)DeaD_EyE Wrote: It's good to know which datatype it should be.
Which format is the texture?

In [8]: struct.unpack('<i', b'\x00\x80\x00\x00')   # < is little endian
Out[8]: (32768,)
In [9]: struct.unpack('>i', b'\x00\x80\x00\x00')   # > is big endian, i is 
Out[9]: (8388608,)
You open the file in binary mode, seek the current position, then you read the data.

def read_size(file):
    """
    This function returns the size of a texture in bytes.
    """
    with open(file, 'rb') as fd:
        fd.seek(1337) # go to the right offset, 1337 is just an example
        size = fd.read(4) # read the size
    return struct.unpack('<i', size)[0]
they are dxt1 and 5 dds...BUT the headers are none standard...


[inline]
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 73 6D 63 61 6E 79 6F 6E 32 00 00 00 00 00 00 00 smcanyon2.......
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 80 00 00 08 00 00 00 41 00 00 00 01 00 00 00 .€......A.......
[/inline]
thats the header before data start @ 00000040

(May-07-2018, 09:53 PM)killerrex Wrote:
(May-07-2018, 04:40 PM)medievil Wrote: possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04' 
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close

You are right for the to_bytes part, it does not exist in python2... I hope there is a good reason to work with python2, otherwise for a new development I strongly recommend to use python3.

About your code, be careful, you are not testing what you think. Try with:
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
f.close()
print('I read from file ' + str(a))

a = b'\01\02\03\04' 
# Important! The inversion is not in.place!
b = a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(b)
w.close()
I read from file has Ç
test file has 04 03 02 01 instead of 00 00 80 00 (inverse of 00 80 00 00 which was read)

I am on 2.7 for learning purposes... moving to 3 later will be much easier if I learn 2.7 really well
new to all of it... except for some 8 bit assembly back in the day and lots and lots of Basic back in high school..lol
Quote
#8
(May-08-2018, 12:53 AM)medievil Wrote: I am on 2.7 for learning purposes... moving to 3 later will be much easier if I learn 2.7 really well
new to all of it... except for some 8 bit assembly back in the day and lots and lots of Basic back in high school..lol

In reality is exactly the opposite. Python 2.7 is full of quirks and tricks accumulated during years with many traps that you need to learn the hard way. In python 3 things are much uniform and organised around well defined concepts so learning is much easier.
In special, working with binary files is much easier in python3 because the bytes type is at its own a row of values between 0 and 255 and not mixed with the strings (especially relevant the first time you work with something out of the ASCII table)
And remember that python 2.7 is receiving just bug fixes and will be out of maintenance in less than 2 years...

I written again your example so it works both in python2 and 3
#!/usr/bin/env python3

# The contents of source.bin are the 1st 64 bytes:
# ~> hexdump -C source.bin 
# 00000000  00 01 02 03 04 05 06 07  08 09 0a 0b 0c 0d 0e 0f  |................|
# 00000010  10 11 12 13 14 15 16 17  18 19 1a 1b 1c 1d 1e 1f  |................|
# 00000020  20 21 22 23 24 25 26 27  28 29 2a 2b 2c 2d 2e 2f  | !"#$%&'()*+,-./|
# 00000030  30 31 32 33 34 35 36 37  38 39 3a 3b 3c 3d 3e 3f  |0123456789:;<=>?|
with open("source.bin", 'rb') as fd:
    # At byte 4 is 04 05 06 07
    fd.seek(4)
    a = fd.read(4)

print('I read from file {!r}'.format(a))

a = b'\01\02\03\04' 
# Important! The inversion is not inplace!
b = a[::-1]

with open("output.bin", 'w+b') as fd:
    fd.write(b)
# And the result:
# ~> hexdump -C output.bin
# 00000000  04 03 02 01                                       |....|
I have also created a small example using struct to read the header you post. Obviously I have no idea of the real field kinds, so I am asuming that it is a 16 bytes string (null terminated as in C) + 8 signed 32 bit integers + 2 floats + 1 double:
with open('demo.bin', 'rb') as fd:
    raw = fd.read()
print(raw)

# From here the raw *MUST* measure only 64 bytes, or struct will complain
# as the input bytes must match exactly the pattern... split your input wisely
# As big endian
big = struct.unpack('>16s8i4f', raw)
print('Big:', big)

# As little endian:
little = struct.unpack('<16s8i4f', raw)
print('Little:', little)
Output:
b'smcanyon2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x00\x00\x08\x00\x00\x00A\x00\x00\x00\x01\x00\x00\x00' Big: (b'smcanyon2\x00\x00\x00\x00\x00\x00\x00', 0, 0, 0, 0, 65536, 65536, 0, 0, 1.1754943508222875e-38, 3.851859888774472e-34, 131072.00048828125) Little: (b'smcanyon2\x00\x00\x00\x00\x00\x00\x00', 0, 0, 0, 0, 256, 256, 0, 0, 4.591774807899561e-41, 1.1210387714598537e-44, 2.121995823e-314)
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  unicodedecodeerror:utf codec can't decode byte 0xe3 in position 1 mariolopes 3 93 Oct-14-2019, 10:17 PM
Last Post: mariolopes
  How do I write class objects to a file in binary mode? Exsul1 7 397 Sep-14-2019, 09:33 PM
Last Post: snippsat
  'utf-8' codec can't decode byte 0xda in position 184: invalid continuation byte karkas 7 372 Sep-12-2019, 11:19 PM
Last Post: newbieAuggie2019
  Byte string catenation inefficient in 3.7? RMJFlack 13 516 Aug-18-2019, 05:19 AM
Last Post: RMJFlack
  Byte array is sorted when sending via USB daviddlc68 1 133 Aug-16-2019, 10:11 AM
Last Post: wavic
  CSV file from Binary to String mr_byte31 2 367 Jul-27-2019, 08:46 PM
Last Post: snippsat
  raw byte of integer jonnin 4 291 Jul-22-2019, 03:48 AM
Last Post: jonnin
  HELP: String of Zero's and One's to binary byte schwasskin 1 1,194 May-19-2019, 07:31 AM
Last Post: heiner55
  Reading data from serial port as byte array vlad93 1 1,257 May-18-2019, 05:26 AM
Last Post: heiner55
  Parse Binary Data File and convert Epoch Time drdevereaux 1 294 May-16-2019, 01:56 AM
Last Post: Larz60+

Forum Jump:


Users browsing this thread: 1 Guest(s)