Python Forum

Greetings!
I’m trying to scan files in a directory, and I found some files are corrupted somehow.
The files are standard text.log.
I tried opening them in Notepad – no errors and I do not see the content of the file.
I tried to open the files with Notepad++, Files appear to have one line:
NULNULNUL....
I thought I could break out of the file by using try/except but it does not work.
Script just runs and runs... without stopping.
Here is a snipped I tried:

from pathlib import Path

for ef in Path('D:\\somedir\\').iterdir() :
    print(f" File ->{ef}")
    try :
        with open(ef,'r') as mfiler: 
            for echl in mfiler :
                print(f" line ->{echl}")
    except OSError as oss :
        print(f"  bad file -> {echl}")

Thank you!

Have a look at this link on using iterdir()

for ef in Path('some path).iterdir() doesn't look right. I've not used it before but, seems like your looping a loop.

When I remove the file in question (the one I cannot open)
the snippet seems working fine, I'm using the following all the time and I think it is not the problem.

for ef in Path('D:\\somedir\\').iterdir() :

It is a file is a problem but I cannot abort it or exit it.

Thank you

Maybe it is working fine. How large are these "corrupted" files? It would take a while to print a million empty strings.

I think I found out how to fix the problem I have. See the snipped below.

Files that fail are not big, just 14MB.
I understand the file has data but the characters in the file are corrupted or unprintable.
The first line of each file starts with "YYMMDD HHMMSS", I thought I can check that.

from pathlib import Path
import re

for ef in Path('D:\\Somedir\\').iterdir() :
    with open(ef, encoding='utf-8', errors='ignore') as mfiler:  
        frt_ln = mfiler.readline()
        if not re.search("\d",frt_ln) :
            print(f"  Found bad file -> {ef}")
            continue

I need to get the first and the last lines from each file, which I'll use later in the script if you wondering why I'm going this way.

Thank you.

Your are printing out 14 MB files? Or is that just example code? If it is just example code, what kind of processing are you doing?

No, I'm not printing each line of the file Wink

I'm searching for specific lines. It just happened that some of the files I cannot open.
I started looking for a way to abort the search of a file if it is 'can't be open/read" or I can't print each line...

Thank you.

What makes you think you cannot open some of the files?

When I found the first file that the script failed on, I tried to open it with the NotePad and NotePad++
Notepad had blank lines, NotePad++ had a line "NULNULNULNUL..."
I created a script to collect files that the script failes to read a first line and I found the NUL character is actually a '\0'.
It appears "\0" is a unicode character.
I added an if statement to my script:

if re.search("\0\0\0", mysring) :
    break

Also, I'm printing the file that failes and the line that failes to a file so I can debug the script...
I also found that the "NULNULNUL" line can happen anywhere in the file

I appreciate your help!

tester_V

menator01

tester_V

deanhystad

tester_V

deanhystad

tester_V

deanhystad

tester_V