Posts: 40
Threads: 17
Joined: Dec 2019
Hi all
I` d like to search a file for multiple unknown hex combinations in a file and have absolutely no idea how to start.
The hex combinations are always 3 byte long and the second and third byte will have different unknown values.
What i know is that the initial combinations to search start with one of the following
8c , 8d , 8e
so ac???? or 8d???? 8e????
Now i want to return the ???? values only if there is also one or multiple ac???? or 8d???? 8e???? or ee???? or ce???? in that file aswell.
For example if 8c21d0 is found anywhere in that file then seach for a second,third... 8c21d0 and also for
ee21d0 8d21d0 8e21d0 ce21d0
Once that is found i`d like to skip returning values if the found bytes match e.g. 8d 12 d0 or 8d 22 d0 or 8d 00 04 etc.
I think to store the skip values in an list is the best solution here.
If the 2nd and third byte is not in that skiplist the function should return 21d0 in that case as first found combinations, but
continue searching for other possible matches and return them aswell.
What would be the best solution / funtion to start such a Project ?
Posts: 6,790
Threads: 20
Joined: Feb 2020
Aug-12-2023, 06:00 PM
(This post was last modified: Aug-12-2023, 06:00 PM by deanhystad.)
Are they hex strings? If so, what is the delimiter?
Posts: 40
Threads: 17
Joined: Dec 2019
(Aug-12-2023, 06:00 PM)deanhystad Wrote: Are they hex strings? If so, what is the delimiter?
no delimiter, raw Binary files , max 1MB in size
Posts: 40
Threads: 17
Joined: Dec 2019
(Aug-12-2023, 06:06 PM)lastyle Wrote: (Aug-12-2023, 06:00 PM)deanhystad Wrote: Are they hex strings? If so, what is the delimiter?
no delimiter, raw Binary files , max 1MB in size
Probably using the Range function Up to 0xff 0xff and then search with that could be Something to start with ?
Posts: 6,790
Threads: 20
Joined: Feb 2020
Aug-12-2023, 08:15 PM
(This post was last modified: Aug-13-2023, 12:08 PM by deanhystad.)
Q?: Can the 8b/8c/8d appear anywhere, or only at multiple of 3 byte offsets?
Here's a way to convert an array of bytes into 24bit ints.
https://stackoverflow.com/questions/1208...1#34128171
Depending on the answer to my question you might need to do this 3 times to get all possible 24 bit integers. Once you have that, look for numbers int he range 0x8C0000 to 0x8F0000. This can be done quickly using numpy. Like this:
https://stackoverflow.com/questions/4503...nge-python
And you can use numpy.unique() to get a count of each unique value.
https://numpy.org/doc/stable/reference/g...nique.html
Posts: 40
Threads: 17
Joined: Dec 2019
(Aug-12-2023, 08:15 PM)deanhystad Wrote: Can the 8b/8c/8d appear anywhere, or only at multiple of 3 byte offsets?
Can appear anywhere and also Multiple Times, or not even once.
Posts: 6,790
Threads: 20
Joined: Feb 2020
Aug-13-2023, 02:04 PM
(This post was last modified: Aug-13-2023, 02:04 PM by deanhystad.)
This is my first pass.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def get_int24(bytes_):
"""Convert bytes to 24bit ints."""
last_byte = bytes_.shape[0] - bytes_.shape[0] % 4
bytes_ = bytes_[:last_byte]
count = last_byte // 3
int24 = as_strided(bytes_.view(np.uint32), strides=(3,), shape=(count,))
return int24 & 0x00ffffff
# Load file and convert to 24bit ints.
bytes = np.memmap('test.txt', dtype=np.dtype('u1'), mode='r')
int24 = get_int24(bytes)
# Throw away values that are not in range x8A000...0x8F0000
inrange = int24[(int24 >= 0x8A0000) & (int24 < 0x8F0000)]
# Get counts for each value. Save as tuple (count, hex value)
counts = [(count, hex(value)) for value, count in zip(*np.unique((inrange), return_counts=True))]
print(sorted(counts, reverse=True)[:10]) I am still unclear about the alignment of the data in the file. This code assumes the file consists of 24bit integers, so each integer startes on a 3 byte boundary.
Output: Bytes: 00 8C 00 8D 00 00
Offset: 0 1 2 3 4 5
My assumption is that 8C 00 8D is not a match because it does not start on a 3 byte boundary. 8D 00 00 is a match because it does.
This solution is brittle. It only works if the file length is a multiple of 3 and 4. The upcast from bytes to in requires 4 bytes. If there are only 3 bytes remaining the upcast will fail. To solve, I think you have to make the last 3 bytes a special case. Since your file is only a megabyte, it might not be worth the hassle. Reading the bytes and using int.from_bytes() might be fast enough. Something like this:
def int24(bytes_, endian='little'):
"""Convert bytes to 24bit ints. Return numpy array of ints."""
return np.array([int.from_bytes(bytes_[x:x+3], endian) for x in range(0, len(bytes_), 3)]) This only takes about a second to process a 1Mbyte file.
Posts: 6,790
Threads: 20
Joined: Feb 2020
Aug-14-2023, 02:28 AM
(This post was last modified: Aug-14-2023, 02:10 PM by deanhystad.)
This is quick and robust (I think). It uses numpy reshape() and concatenate() to pad the 24 bit integers to 32 bits. Probably not as quick as using as_strided(), but only takes 0.006 seconds to process a 1Mbyte file.
import numpy as np
import sys
def int24(bytes_):
"""Convert bytes to 24bit ints. Return numpy array of ints."""
# How many 3 byte ints are in bytes_?
count = bytes_.shape[0] // 3
# Reshape bytes_ into 3 byte arrays.
bytes_= bytes_[:count*3].reshape((count, 3))
# Pad with zeros to make 4 byte arrays
if sys.byteorder == "little":
padded = np.concatenate((bytes_, np.zeros((count, 1), dtype=np.uint8)), axis=1)
else:
padded = np.concatenate((np.zeros((count, 1), dtype=np.uint8), bytes_), axis=1)
# Convert 4 byte arrays to 4 byte ints
return np.frombuffer(padded.tobytes(), dtype=np.uint32)
# Load file and convert to 24bit ints.
bytes_ = np.fromfile('test.txt', dtype=np.uint8)
asints = int24(bytes_)
# Throw away values that are not in range x8A000...0x8F0000
inrange = asints[(asints >= 0x8A0000) & (asints < 0x8F0000)]
# Get counts for each value. Save as tuple (count, hex value)
counts = [(count, hex(value)) for value, count in zip(*np.unique((inrange), return_counts=True))]
print(sorted(counts, reverse=True)[:10]) And if the 8C/8D/8E can be anywhere in the file, at any offset, just shift the bytes_ array and resample.
# Load file and convert to 24bit ints. Shift the
# starting point to get all 24 bit ints.
bytes_ = np.fromfile('test.txt', dtype=np.uint8)
asints = np.concatenate(
(int24(bytes_), int24(bytes_[1:]), int24(bytes_[2:]))
)
|