Python Forum
Reading Binary File, Missing Occurences
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Reading Binary File, Missing Occurences
#1
I am reading a set of binary files (some, not all) recursively in a directory structure. These files are now-ancient Visual Foxpro files of various types.

Using a tool like TextWrangler, it finds all the occurrences of the search string.

Yet, in my python code (I'm still a semi-newbie), it misses some. Not sure why.

I've defined the search term like this: search_term = b'permissionlevel'

The code is pretty simple, but I'm missing something. I also experimented with reading the binary files in 1024 byte chunks (and other buffer lengths), but that did no better (worse, actually).

The python code catches 374 matches of the search term, when the correct number is 430 (as shown by two other search programs).

If anyone can see what I might be doing wrong, I would really appreciate that. :)

Thanks,

-------------------------
fileCount = 0
totalMatches = 0
count = 0

for dirName, subdirlist, filelist in os.walk(path):
    print('Found directory: ' + dirName)
    for file in filelist:
        fileCount += 1
        filePath = os.path.join(dirName, file)
        # print("\n***OPENING FILE***" + filePath + "\n")
        with open(filePath, 'rb') as f:
            lines = f.readlines()
            for line in lines:
                if line.upper().find(search_term.upper()) != -1:
                    totalMatches += 1
                    count += 1
                    print(line)
            if (count > 0):
                print("Found " + str(count) + " match(es) in " + filePath)
            count = 0  # next file, please...

print("\n" + str(fileCount) + " files found.")
print("\n" + str(totalMatches) + " Total matches.")
Reply
#2
Please edit your post so that code is put within Python code tags. Also use ctrl+shift+v when pasting code, so that indentation is preserved.
Reply
#3
Found the python code tag! Cool

I will make sure to use it from now on.

Thanks.
Reply
#4
If you're reading binary files, what does f.readlines() actually do? I mean, it's a binary file so there's no notion of "lines", it's just a sequence of bytes. If they're not too big, why not just read the entire thing into memory with f.read() and start from there?

Also, can use string functions like upper()? What does that do to bytes that aren't valid characters?
Reply
#5
(Jan-04-2018, 06:43 PM)mpd Wrote: If you're reading binary files, what does f.readlines() actually do? I mean, it's a binary file so there's no notion of "lines", it's just a sequence of bytes. If they're not too big, why not just read the entire thing into memory with f.read() and start from there?

Also, can use string functions like upper()? What does that do to bytes that aren't valid characters?

I had experimented a bit with read(), but after your posting, I went back and got it working!

So, for each file, I now do a:
  data = f.read()
Then, doing a regex search:

matches = re.findall(search_term.upper(), data.upper())
The "upper()" is needed since I'm searching for text in the binary file. Without upper(), I miss a hundred or so matches.

I also experimented with chunking the file, say, doing a read(4096), but that seems problematic since you could do a read that breaks up the string you're searching for. Still could do it, but more code would be necessary to make sure your read hadn't read only part of the search string.

Thanks very much for your reply. It made the difference! Big Grin

- O
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Sad problems with reading csv file. MassiJames 3 608 Nov-16-2023, 03:41 PM
Last Post: snippsat
  Reading a file name fron a folder on my desktop Fiona 4 890 Aug-23-2023, 11:11 AM
Last Post: Axel_Erfurt
  How do I read and write a binary file in Python? blackears 6 6,405 Jun-06-2023, 06:37 PM
Last Post: rajeshgk
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,086 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Reading a file JonWayn 3 1,089 Dec-30-2022, 10:18 AM
Last Post: ibreeden
  Reading Specific Rows In a CSV File finndude 3 968 Dec-13-2022, 03:19 PM
Last Post: finndude
  Excel file reading problem max70990 1 889 Dec-11-2022, 07:00 PM
Last Post: deanhystad
  Replace columns indexes reading a XSLX file Larry1888 2 975 Nov-18-2022, 10:16 PM
Last Post: Pedroski55
  Failing reading a file and cannot exit it... tester_V 8 1,795 Aug-19-2022, 10:27 PM
Last Post: tester_V
  Reading .csv file doug2019 4 1,696 Apr-29-2022, 09:55 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020