Python Forum
Failing reading a file and cannot exit it... - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Failing reading a file and cannot exit it... (/thread-37999.html)



Failing reading a file and cannot exit it... - tester_V - Aug-19-2022

Greetings!
I’m trying to scan files in a directory, and I found some files are corrupted somehow.
The files are standard text.log.
I tried opening them in Notepad – no errors and I do not see the content of the file.
I tried to open the files with Notepad++, Files appear to have one line:
NULNULNUL....
I thought I could break out of the file by using try/except but it does not work.
Script just runs and runs... without stopping.
Here is a snipped I tried:
from pathlib import Path

for ef in Path('D:\\somedir\\').iterdir() :
    print(f" File ->{ef}")
    try :
        with open(ef,'r') as mfiler: 
            for echl in mfiler :
                print(f" line ->{echl}")
    except OSError as oss :
        print(f"  bad file -> {echl}")
Thank you!


RE: Failing reading a file and cannot exit it... - menator01 - Aug-19-2022

Have a look at this link on using iterdir()

for ef in Path('some path).iterdir() doesn't look right. I've not used it before but, seems like your looping a loop.


RE: Failing reading a file and cannot exit it... - tester_V - Aug-19-2022

When I remove the file in question (the one I cannot open)
the snippet seems working fine, I'm using the following all the time and I think it is not the problem.
for ef in Path('D:\\somedir\\').iterdir() :
It is a file is a problem but I cannot abort it or exit it.

Thank you


RE: Failing reading a file and cannot exit it... - deanhystad - Aug-19-2022

Maybe it is working fine. How large are these "corrupted" files? It would take a while to print a million empty strings.


RE: Failing reading a file and cannot exit it... - tester_V - Aug-19-2022

I think I found out how to fix the problem I have. See the snipped below.

Files that fail are not big, just 14MB.
I understand the file has data but the characters in the file are corrupted or unprintable.
The first line of each file starts with "YYMMDD HHMMSS", I thought I can check that.
from pathlib import Path
import re

for ef in Path('D:\\Somedir\\').iterdir() :
    with open(ef, encoding='utf-8', errors='ignore') as mfiler:  
        frt_ln = mfiler.readline()
        if not re.search("\d",frt_ln) :
            print(f"  Found bad file -> {ef}")
            continue   
I need to get the first and the last lines from each file, which I'll use later in the script if you wondering why I'm going this way.

Thank you.


RE: Failing reading a file and cannot exit it... - deanhystad - Aug-19-2022

Your are printing out 14 MB files? Or is that just example code? If it is just example code, what kind of processing are you doing?


RE: Failing reading a file and cannot exit it... - tester_V - Aug-19-2022

No, I'm not printing each line of the file Wink
I'm searching for specific lines. It just happened that some of the files I cannot open.
I started looking for a way to abort the search of a file if it is 'can't be open/read" or I can't print each line...

Thank you.


RE: Failing reading a file and cannot exit it... - deanhystad - Aug-19-2022

What makes you think you cannot open some of the files?


RE: Failing reading a file and cannot exit it... - tester_V - Aug-19-2022

When I found the first file that the script failed on, I tried to open it with the NotePad and NotePad++
Notepad had blank lines, NotePad++ had a line "NULNULNULNUL..."
I created a script to collect files that the script failes to read a first line and I found the NUL character is actually a '\0'.
It appears "\0" is a unicode character.
I added an if statement to my script:

if re.search("\0\0\0", mysring) :
    break
Also, I'm printing the file that failes and the line that failes to a file so I can debug the script...
I also found that the "NULNULNUL" line can happen anywhere in the file

I appreciate your help!