Posts: 8
Threads: 2
Joined: May 2018
May-07-2018, 04:30 AM
(This post was last modified: May-07-2018, 05:01 AM by buran.)
can't seem to find a solution that works.
the program I am working on basically takes a bunch of specially formatted texture files and dumps them into one large file with a special header (And subheaders)
issue I am running across seems really simply, but the solution is anything but.
I need to read a 4 byte location (easy enough)to get data size, SO I have to read that location then byte swap it .. IE read aa bb cc dd and turn it into dd cc bb aa so I can use the file size as an offset before the next texture. Everything I try seems to turn it into ascii equivalent and doesn't use the actual hex data. I don't understand why using the raw data is so difficult. I have done exhaustive searches, tried array.array, struct.pack/unpack while changing the endian..
anyone got an easy method?
Posts: 116
Threads: 1
Joined: Apr 2018
What you are describing is easy to do with the bytes type:
1 2 3 4 5 6 |
>>> a = b '\01\02\03\04'
>>> a[:: - 1 ]
b '\x04\x03\x02\x01'
|
If your problem is that the numbers in the binary file are coded in a different endian as your machine, you can use the from_bytes of the int class:
1 2 3 4 5 6 7 8 |
>>> int .from_bytes(a, 'big' )
16909060
>>> hex ( int .from_bytes(a, 'big' ))
'0x1020304'
>>> int .from_bytes(a, 'little' )
67305985
>>> hex ( int .from_bytes(a, 'little' ))
'0x4030201'
|
The bytes type behaves almost as a string or a list, and is really practical to deal with binary files. You might want to look at the struct module.
Posts: 2,128
Threads: 11
Joined: May 2017
May-07-2018, 04:19 PM
(This post was last modified: May-07-2018, 04:20 PM by DeaD_EyE.)
If you're reading/processing binary data, you should take a look into struct.
In addition you can use third party modules, if your task is more complex.
https://pypi.org/project/bitarray/
There are similar modules like this. I haven't tried them yet.
Posts: 8
Threads: 2
Joined: May 2018
(May-07-2018, 08:21 AM)killerrex Wrote: What you are describing is easy to do with the bytes type:
1 2 3 4 5 6 |
>>> a = b '\01\02\03\04'
>>> a[:: - 1 ]
b '\x04\x03\x02\x01'
|
If your problem is that the numbers in the binary file are coded in a different endian as your machine, you can use the from_bytes of the int class:
1 2 3 4 5 6 7 8 |
>>> int .from_bytes(a, 'big' )
16909060
>>> hex ( int .from_bytes(a, 'big' ))
'0x1020304'
>>> int .from_bytes(a, 'little' )
67305985
>>> hex ( int .from_bytes(a, 'little' ))
'0x4030201'
|
The bytes type behaves almost as a string or a list, and is really practical to deal with binary files. You might want to look at the struct module. possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
1 2 3 4 5 6 7 8 9 10 |
import os
f = open ( "y:/test1/canyon(msncyn).mtx" , 'rb' )
f.seek( 48 )
a = f.read( 4 )
a = b '\01\02\03\04'
a[:: - 1 ]
w = open ( "y:/test1/test.mtx" , 'wb' )
w.write(a)
f.close
w.close
|
as a test case... bytes read are 00 80 00 00
bytes written back were 01 02 03 04
(May-07-2018, 04:19 PM)DeaD_EyE Wrote: If you're reading/processing binary data, you should take a look into struct.
In addition you can use third party modules, if your task is more complex.
https://pypi.org/project/bitarray/
There are similar modules like this. I haven't tried them yet. tried using struct, though maybe I am missing something, it returns the ascii value and not the data value... from the above example my 00 80 00 00 turns into 30's with a 38 thrown in
Posts: 2,128
Threads: 11
Joined: May 2017
It's good to know which datatype it should be.
Which format is the texture?
1 2 3 4 |
In [ 8 ]: struct.unpack( '<i' , b '\x00\x80\x00\x00' )
Out[ 8 ]: ( 32768 ,)
In [ 9 ]: struct.unpack( '>i' , b '\x00\x80\x00\x00' )
Out[ 9 ]: ( 8388608 ,)
|
You open the file in binary mode, seek the current position, then you read the data.
1 2 3 4 5 6 7 8 |
def read_size( file ):
with open ( file , 'rb' ) as fd:
fd.seek( 1337 )
size = fd.read( 4 )
return struct.unpack( '<i' , size)[ 0 ]
|
Posts: 116
Threads: 1
Joined: Apr 2018
(May-07-2018, 04:40 PM)medievil Wrote: possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
1 2 3 4 5 6 7 8 9 10 |
import os
f = open ( "y:/test1/canyon(msncyn).mtx" , 'rb' )
f.seek( 48 )
a = f.read( 4 )
a = b '\01\02\03\04'
a[:: - 1 ]
w = open ( "y:/test1/test.mtx" , 'wb' )
w.write(a)
f.close
w.close
|
You are right for the to_bytes part, it does not exist in python2... I hope there is a good reason to work with python2, otherwise for a new development I strongly recommend to use python3.
About your code, be careful, you are not testing what you think. Try with:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import os
f = open ( "y:/test1/canyon(msncyn).mtx" , 'rb' )
f.seek( 48 )
a = f.read( 4 )
f.close()
print ( 'I read from file ' + str (a))
a = b '\01\02\03\04'
b = a[:: - 1 ]
w = open ( "y:/test1/test.mtx" , 'wb' )
w.write(b)
w.close()
|
Posts: 8
Threads: 2
Joined: May 2018
May-08-2018, 12:53 AM
(This post was last modified: May-08-2018, 01:29 AM by medievil.)
(May-07-2018, 08:34 PM)DeaD_EyE Wrote: It's good to know which datatype it should be.
Which format is the texture?
1 2 3 4 |
In [ 8 ]: struct.unpack( '<i' , b '\x00\x80\x00\x00' )
Out[ 8 ]: ( 32768 ,)
In [ 9 ]: struct.unpack( '>i' , b '\x00\x80\x00\x00' )
Out[ 9 ]: ( 8388608 ,)
|
You open the file in binary mode, seek the current position, then you read the data.
1 2 3 4 5 6 7 8 |
def read_size( file ):
with open ( file , 'rb' ) as fd:
fd.seek( 1337 )
size = fd.read( 4 )
return struct.unpack( '<i' , size)[ 0 ]
|
they are dxt1 and 5 dds...BUT the headers are none standard...
[inline]
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 73 6D 63 61 6E 79 6F 6E 32 00 00 00 00 00 00 00 smcanyon2.......
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 80 00 00 08 00 00 00 41 00 00 00 01 00 00 00 .€......A.......
[/inline]
thats the header before data start @ 00000040
(May-07-2018, 09:53 PM)killerrex Wrote: (May-07-2018, 04:40 PM)medievil Wrote: possdibly because you are working on python 3 and I am on 2.7, but the first example didn't work.. I used
1 2 3 4 5 6 7 8 9 10 |
import os
f = open ( "y:/test1/canyon(msncyn).mtx" , 'rb' )
f.seek( 48 )
a = f.read( 4 )
a = b '\01\02\03\04'
a[:: - 1 ]
w = open ( "y:/test1/test.mtx" , 'wb' )
w.write(a)
f.close
w.close
|
You are right for the to_bytes part, it does not exist in python2... I hope there is a good reason to work with python2, otherwise for a new development I strongly recommend to use python3.
About your code, be careful, you are not testing what you think. Try with:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import os
f = open ( "y:/test1/canyon(msncyn).mtx" , 'rb' )
f.seek( 48 )
a = f.read( 4 )
f.close()
print ( 'I read from file ' + str (a))
a = b '\01\02\03\04'
b = a[:: - 1 ]
w = open ( "y:/test1/test.mtx" , 'wb' )
w.write(b)
w.close()
|
I read from file has Ç
test file has 04 03 02 01 instead of 00 00 80 00 (inverse of 00 80 00 00 which was read)
I am on 2.7 for learning purposes... moving to 3 later will be much easier if I learn 2.7 really well
new to all of it... except for some 8 bit assembly back in the day and lots and lots of Basic back in high school..lol
Posts: 116
Threads: 1
Joined: Apr 2018
May-08-2018, 08:16 AM
(This post was last modified: May-08-2018, 08:16 AM by killerrex.)
(May-08-2018, 12:53 AM)medievil Wrote: I am on 2.7 for learning purposes... moving to 3 later will be much easier if I learn 2.7 really well
new to all of it... except for some 8 bit assembly back in the day and lots and lots of Basic back in high school..lol
In reality is exactly the opposite. Python 2.7 is full of quirks and tricks accumulated during years with many traps that you need to learn the hard way. In python 3 things are much uniform and organised around well defined concepts so learning is much easier.
In special, working with binary files is much easier in python3 because the bytes type is at its own a row of values between 0 and 255 and not mixed with the strings (especially relevant the first time you work with something out of the ASCII table)
And remember that python 2.7 is receiving just bug fixes and will be out of maintenance in less than 2 years...
I written again your example so it works both in python2 and 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
with open ( "source.bin" , 'rb' ) as fd:
fd.seek( 4 )
a = fd.read( 4 )
print ( 'I read from file {!r}' . format (a))
a = b '\01\02\03\04'
b = a[:: - 1 ]
with open ( "output.bin" , 'w+b' ) as fd:
fd.write(b)
|
I have also created a small example using struct to read the header you post. Obviously I have no idea of the real field kinds, so I am asuming that it is a 16 bytes string (null terminated as in C) + 8 signed 32 bit integers + 2 floats + 1 double:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
with open ( 'demo.bin' , 'rb' ) as fd:
raw = fd.read()
print (raw)
big = struct.unpack( '>16s8i4f' , raw)
print ( 'Big:' , big)
little = struct.unpack( '<16s8i4f' , raw)
print ( 'Little:' , little)
|
Output: b'smcanyon2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x00\x00\x08\x00\x00\x00A\x00\x00\x00\x01\x00\x00\x00'
Big: (b'smcanyon2\x00\x00\x00\x00\x00\x00\x00', 0, 0, 0, 0, 65536, 65536, 0, 0, 1.1754943508222875e-38, 3.851859888774472e-34, 131072.00048828125)
Little: (b'smcanyon2\x00\x00\x00\x00\x00\x00\x00', 0, 0, 0, 0, 256, 256, 0, 0, 4.591774807899561e-41, 1.1210387714598537e-44, 2.121995823e-314)
|