Python Forum
Reading Binary File, Missing Occurences - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Reading Binary File, Missing Occurences (/thread-7336.html)



Reading Binary File, Missing Occurences - Oliver - Jan-04-2018

I am reading a set of binary files (some, not all) recursively in a directory structure. These files are now-ancient Visual Foxpro files of various types.

Using a tool like TextWrangler, it finds all the occurrences of the search string.

Yet, in my python code (I'm still a semi-newbie), it misses some. Not sure why.

I've defined the search term like this: search_term = b'permissionlevel'

The code is pretty simple, but I'm missing something. I also experimented with reading the binary files in 1024 byte chunks (and other buffer lengths), but that did no better (worse, actually).

The python code catches 374 matches of the search term, when the correct number is 430 (as shown by two other search programs).

If anyone can see what I might be doing wrong, I would really appreciate that. :)

Thanks,

-------------------------
fileCount = 0
totalMatches = 0
count = 0

for dirName, subdirlist, filelist in os.walk(path):
    print('Found directory: ' + dirName)
    for file in filelist:
        fileCount += 1
        filePath = os.path.join(dirName, file)
        # print("\n***OPENING FILE***" + filePath + "\n")
        with open(filePath, 'rb') as f:
            lines = f.readlines()
            for line in lines:
                if line.upper().find(search_term.upper()) != -1:
                    totalMatches += 1
                    count += 1
                    print(line)
            if (count > 0):
                print("Found " + str(count) + " match(es) in " + filePath)
            count = 0  # next file, please...

print("\n" + str(fileCount) + " files found.")
print("\n" + str(totalMatches) + " Total matches.")



RE: Reading Binary File, Missing Occurences - j.crater - Jan-04-2018

Please edit your post so that code is put within Python code tags. Also use ctrl+shift+v when pasting code, so that indentation is preserved.


RE: Reading Binary File, Missing Occurences - Oliver - Jan-04-2018

Found the python code tag! Cool

I will make sure to use it from now on.

Thanks.


RE: Reading Binary File, Missing Occurences - mpd - Jan-04-2018

If you're reading binary files, what does f.readlines() actually do? I mean, it's a binary file so there's no notion of "lines", it's just a sequence of bytes. If they're not too big, why not just read the entire thing into memory with f.read() and start from there?

Also, can use string functions like upper()? What does that do to bytes that aren't valid characters?


RE: Reading Binary File, Missing Occurences - Oliver - Jan-05-2018

(Jan-04-2018, 06:43 PM)mpd Wrote: If you're reading binary files, what does f.readlines() actually do? I mean, it's a binary file so there's no notion of "lines", it's just a sequence of bytes. If they're not too big, why not just read the entire thing into memory with f.read() and start from there?

Also, can use string functions like upper()? What does that do to bytes that aren't valid characters?

I had experimented a bit with read(), but after your posting, I went back and got it working!

So, for each file, I now do a:
  data = f.read()
Then, doing a regex search:

matches = re.findall(search_term.upper(), data.upper())
The "upper()" is needed since I'm searching for text in the binary file. Without upper(), I miss a hundred or so matches.

I also experimented with chunking the file, say, doing a read(4096), but that seems problematic since you could do a read that breaks up the string you're searching for. Still could do it, but more code would be necessary to make sure your read hadn't read only part of the search string.

Thanks very much for your reply. It made the difference! Big Grin

- O