Python Forum
Search for multiple unknown 3 (2) Byte combinations in a file.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Search for multiple unknown 3 (2) Byte combinations in a file.
#7
This is my first pass.
import numpy as np
from numpy.lib.stride_tricks import as_strided

def get_int24(bytes_):
    """Convert bytes to 24bit ints."""
    last_byte = bytes_.shape[0] - bytes_.shape[0] % 4
    bytes_ = bytes_[:last_byte]
    count = last_byte // 3
    int24 = as_strided(bytes_.view(np.uint32), strides=(3,), shape=(count,))
    return int24 & 0x00ffffff

# Load file and convert to 24bit ints.
bytes = np.memmap('test.txt', dtype=np.dtype('u1'), mode='r')
int24 = get_int24(bytes)

# Throw away values that are not in range x8A000...0x8F0000
inrange = int24[(int24 >= 0x8A0000) & (int24 < 0x8F0000)]

# Get counts for each value.  Save as tuple (count, hex value)
counts = [(count, hex(value)) for value, count in zip(*np.unique((inrange), return_counts=True))]

print(sorted(counts, reverse=True)[:10])
I am still unclear about the alignment of the data in the file. This code assumes the file consists of 24bit integers, so each integer startes on a 3 byte boundary.
Output:
Bytes: 00 8C 00 8D 00 00 Offset: 0 1 2 3 4 5
My assumption is that 8C 00 8D is not a match because it does not start on a 3 byte boundary. 8D 00 00 is a match because it does.

This solution is brittle. It only works if the file length is a multiple of 3 and 4. The upcast from bytes to in requires 4 bytes. If there are only 3 bytes remaining the upcast will fail. To solve, I think you have to make the last 3 bytes a special case. Since your file is only a megabyte, it might not be worth the hassle. Reading the bytes and using int.from_bytes() might be fast enough. Something like this:
def int24(bytes_, endian='little'):
    """Convert bytes to 24bit ints.  Return numpy array of ints."""
    return np.array([int.from_bytes(bytes_[x:x+3], endian) for x in range(0, len(bytes_), 3)])
This only takes about a second to process a 1Mbyte file.
Reply


Messages In This Thread
RE: Search for multiple unknown 3 (2) Byte combinations in a file. - by deanhystad - Aug-13-2023, 02:04 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Search Excel File with a list of values huzzug 4 1,376 Nov-03-2023, 05:35 PM
Last Post: huzzug
  search file by regex SamLiu 1 996 Feb-23-2023, 01:19 PM
Last Post: deanhystad
  Finding combinations of list of items (30 or so) LynnS 1 939 Jan-25-2023, 02:57 PM
Last Post: deanhystad
  If function is false search next file mattbatt84 2 1,230 Sep-04-2022, 01:56 PM
Last Post: deanhystad
  Python: re.findall to find multiple instances don't work but search worked Secret 1 1,293 Aug-30-2022, 08:40 PM
Last Post: deanhystad
  Search multiple CSV files for a string or strings cubangt 7 8,407 Feb-23-2022, 12:53 AM
Last Post: Pedroski55
  fuzzywuzzy search string in text file marfer 9 4,792 Aug-03-2021, 02:41 AM
Last Post: deanhystad
  How can I find all combinations with a regular expression? AlekseyPython 0 1,729 Jun-23-2021, 04:48 PM
Last Post: AlekseyPython
  Cloning a directory and using a .CSV file as a reference to search and replace bg25lam 2 2,220 May-31-2021, 07:00 AM
Last Post: bowlofred
  All possible combinations CODEP 2 1,934 Dec-01-2020, 06:10 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020