Posts: 8
Threads: 2
Joined: May 2018
May-07-2018, 04:30 AM
(This post was last modified: May-07-2018, 05:01 AM by buran.)
can't seem to find a solution that works.
the program I am working on basically takes a bunch of specially formatted texture files and dumps them into one large file with a special header (And subheaders)
issue I am running across seems really simply, but the solution is anything but.
I need to read a 4 byte location (easy enough)to get data size, SO I have to read that location then byte swap it .. IE read aa bb cc dd and turn it into dd cc bb aa so I can use the file size as an offset before the next texture. Everything I try seems to turn it into ascii equivalent and doesn't use the actual hex data. I don't understand why using the raw data is so difficult. I have done exhaustive searches, tried array.array, struct.pack/unpack while changing the endian..
anyone got an easy method?
Posts: 116
Threads: 1
Joined: Apr 2018
What you are describing is easy to do with the bytes type:
# This can be the result from
# with open('file.bin', 'rb') as fd:
# a = fd.read(4)
>>> a = b'\01\02\03\04'
>>> a[::-1]
b'\x04\x03\x02\x01' If your problem is that the numbers in the binary file are coded in a different endian as your machine, you can use the from_bytes of the int class:
>>> int.from_bytes(a, 'big')
16909060
>>> hex(int.from_bytes(a, 'big'))
'0x1020304'
>>> int.from_bytes(a, 'little')
67305985
>>> hex(int.from_bytes(a, 'little'))
'0x4030201' The bytes type behaves almost as a string or a list, and is really practical to deal with binary files. You might want to look at the struct module.
Posts: 2,128
Threads: 11
Joined: May 2017
May-07-2018, 04:19 PM
(This post was last modified: May-07-2018, 04:20 PM by DeaD_EyE.)
If you're reading/processing binary data, you should take a look into struct.
In addition you can use third party modules, if your task is more complex.
https://pypi.org/project/bitarray/
There are similar modules like this. I haven't tried them yet.
Posts: 8
Threads: 2
Joined: May 2018
(May-07-2018, 08:21 AM)killerrex Wrote: What you are describing is easy to do with the bytes type:
# This can be the result from
# with open('file.bin', 'rb') as fd:
# a = fd.read(4)
>>> a = b'\01\02\03\04'
>>> a[::-1]
b'\x04\x03\x02\x01' If your problem is that the numbers in the binary file are coded in a different endian as your machine, you can use the from_bytes of the int class:
>>> int.from_bytes(a, 'big')
16909060
>>> hex(int.from_bytes(a, 'big'))
'0x1020304'
>>> int.from_bytes(a, 'little')
67305985
>>> hex(int.from_bytes(a, 'little'))
'0x4030201' The bytes type behaves almost as a string or a list, and is really practical to deal with binary files. You might want to look at the struct module. possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04'
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close as a test case... bytes read are 00 80 00 00
bytes written back were 01 02 03 04
(May-07-2018, 04:19 PM)DeaD_EyE Wrote: If you're reading/processing binary data, you should take a look into struct.
In addition you can use third party modules, if your task is more complex.
https://pypi.org/project/bitarray/
There are similar modules like this. I haven't tried them yet. tried using struct, though maybe I am missing something, it returns the ascii value and not the data value... from the above example my 00 80 00 00 turns into 30's with a 38 thrown in
Posts: 2,128
Threads: 11
Joined: May 2017
It's good to know which datatype it should be.
Which format is the texture?
In [8]: struct.unpack('<i', b'\x00\x80\x00\x00') # < is little endian
Out[8]: (32768,)
In [9]: struct.unpack('>i', b'\x00\x80\x00\x00') # > is big endian, i is
Out[9]: (8388608,) You open the file in binary mode, seek the current position, then you read the data.
def read_size(file):
"""
This function returns the size of a texture in bytes.
"""
with open(file, 'rb') as fd:
fd.seek(1337) # go to the right offset, 1337 is just an example
size = fd.read(4) # read the size
return struct.unpack('<i', size)[0]
Posts: 116
Threads: 1
Joined: Apr 2018
(May-07-2018, 04:40 PM)medievil Wrote: possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04'
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close
You are right for the to_bytes part, it does not exist in python2... I hope there is a good reason to work with python2, otherwise for a new development I strongly recommend to use python3.
About your code, be careful, you are not testing what you think. Try with:
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
f.close()
print('I read from file ' + str(a))
a = b'\01\02\03\04'
# Important! The inversion is not in.place!
b = a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(b)
w.close()
Posts: 8
Threads: 2
Joined: May 2018
May-08-2018, 12:53 AM
(This post was last modified: May-08-2018, 01:29 AM by medievil.)
(May-07-2018, 08:34 PM)DeaD_EyE Wrote: It's good to know which datatype it should be.
Which format is the texture?
In [8]: struct.unpack('<i', b'\x00\x80\x00\x00') # < is little endian
Out[8]: (32768,)
In [9]: struct.unpack('>i', b'\x00\x80\x00\x00') # > is big endian, i is
Out[9]: (8388608,) You open the file in binary mode, seek the current position, then you read the data.
def read_size(file):
"""
This function returns the size of a texture in bytes.
"""
with open(file, 'rb') as fd:
fd.seek(1337) # go to the right offset, 1337 is just an example
size = fd.read(4) # read the size
return struct.unpack('<i', size)[0] they are dxt1 and 5 dds...BUT the headers are none standard...
[inline]
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 73 6D 63 61 6E 79 6F 6E 32 00 00 00 00 00 00 00 smcanyon2.......
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 80 00 00 08 00 00 00 41 00 00 00 01 00 00 00 .€......A.......
[/inline]
thats the header before data start @ 00000040
(May-07-2018, 09:53 PM)killerrex Wrote: (May-07-2018, 04:40 PM)medievil Wrote: possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
a = b'\01\02\03\04'
a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(a)
f.close
w.close
You are right for the to_bytes part, it does not exist in python2... I hope there is a good reason to work with python2, otherwise for a new development I strongly recommend to use python3.
About your code, be careful, you are not testing what you think. Try with:
import os
f = open("y:/test1/canyon(msncyn).mtx", 'rb')
f.seek(48)
a = f.read(4)
f.close()
print('I read from file ' + str(a))
a = b'\01\02\03\04'
# Important! The inversion is not in.place!
b = a[::-1]
w = open("y:/test1/test.mtx", 'wb')
w.write(b)
w.close() I read from file has Ç
test file has 04 03 02 01 instead of 00 00 80 00 (inverse of 00 80 00 00 which was read)
I am on 2.7 for learning purposes... moving to 3 later will be much easier if I learn 2.7 really well
new to all of it... except for some 8 bit assembly back in the day and lots and lots of Basic back in high school..lol
Posts: 116
Threads: 1
Joined: Apr 2018
May-08-2018, 08:16 AM
(This post was last modified: May-08-2018, 08:16 AM by killerrex.)
(May-08-2018, 12:53 AM)medievil Wrote: I am on 2.7 for learning purposes... moving to 3 later will be much easier if I learn 2.7 really well
new to all of it... except for some 8 bit assembly back in the day and lots and lots of Basic back in high school..lol
In reality is exactly the opposite. Python 2.7 is full of quirks and tricks accumulated during years with many traps that you need to learn the hard way. In python 3 things are much uniform and organised around well defined concepts so learning is much easier.
In special, working with binary files is much easier in python3 because the bytes type is at its own a row of values between 0 and 255 and not mixed with the strings (especially relevant the first time you work with something out of the ASCII table)
And remember that python 2.7 is receiving just bug fixes and will be out of maintenance in less than 2 years...
I written again your example so it works both in python2 and 3
#!/usr/bin/env python3
# The contents of source.bin are the 1st 64 bytes:
# ~> hexdump -C source.bin
# 00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
# 00000010 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f |................|
# 00000020 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f | !"#$%&'()*+,-./|
# 00000030 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f |0123456789:;<=>?|
with open("source.bin", 'rb') as fd:
# At byte 4 is 04 05 06 07
fd.seek(4)
a = fd.read(4)
print('I read from file {!r}'.format(a))
a = b'\01\02\03\04'
# Important! The inversion is not inplace!
b = a[::-1]
with open("output.bin", 'w+b') as fd:
fd.write(b)
# And the result:
# ~> hexdump -C output.bin
# 00000000 04 03 02 01 |....| I have also created a small example using struct to read the header you post. Obviously I have no idea of the real field kinds, so I am asuming that it is a 16 bytes string (null terminated as in C) + 8 signed 32 bit integers + 2 floats + 1 double:
with open('demo.bin', 'rb') as fd:
raw = fd.read()
print(raw)
# From here the raw *MUST* measure only 64 bytes, or struct will complain
# as the input bytes must match exactly the pattern... split your input wisely
# As big endian
big = struct.unpack('>16s8i4f', raw)
print('Big:', big)
# As little endian:
little = struct.unpack('<16s8i4f', raw)
print('Little:', little) Output: b'smcanyon2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x00\x00\x08\x00\x00\x00A\x00\x00\x00\x01\x00\x00\x00'
Big: (b'smcanyon2\x00\x00\x00\x00\x00\x00\x00', 0, 0, 0, 0, 65536, 65536, 0, 0, 1.1754943508222875e-38, 3.851859888774472e-34, 131072.00048828125)
Little: (b'smcanyon2\x00\x00\x00\x00\x00\x00\x00', 0, 0, 0, 0, 256, 256, 0, 0, 4.591774807899561e-41, 1.1210387714598537e-44, 2.121995823e-314)
|