Python Forum
Failing reading a file and cannot exit it...
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Failing reading a file and cannot exit it...
#1
Greetings!
I’m trying to scan files in a directory, and I found some files are corrupted somehow.
The files are standard text.log.
I tried opening them in Notepad – no errors and I do not see the content of the file.
I tried to open the files with Notepad++, Files appear to have one line:
NULNULNUL....
I thought I could break out of the file by using try/except but it does not work.
Script just runs and runs... without stopping.
Here is a snipped I tried:
from pathlib import Path

for ef in Path('D:\\somedir\\').iterdir() :
    print(f" File ->{ef}")
    try :
        with open(ef,'r') as mfiler: 
            for echl in mfiler :
                print(f" line ->{echl}")
    except OSError as oss :
        print(f"  bad file -> {echl}")
Thank you!
Reply
#2
Have a look at this link on using iterdir()

for ef in Path('some path).iterdir() doesn't look right. I've not used it before but, seems like your looping a loop.
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags


Reply
#3
When I remove the file in question (the one I cannot open)
the snippet seems working fine, I'm using the following all the time and I think it is not the problem.
for ef in Path('D:\\somedir\\').iterdir() :
It is a file is a problem but I cannot abort it or exit it.

Thank you
Reply
#4
Maybe it is working fine. How large are these "corrupted" files? It would take a while to print a million empty strings.
Reply
#5
I think I found out how to fix the problem I have. See the snipped below.

Files that fail are not big, just 14MB.
I understand the file has data but the characters in the file are corrupted or unprintable.
The first line of each file starts with "YYMMDD HHMMSS", I thought I can check that.
from pathlib import Path
import re

for ef in Path('D:\\Somedir\\').iterdir() :
    with open(ef, encoding='utf-8', errors='ignore') as mfiler:  
        frt_ln = mfiler.readline()
        if not re.search("\d",frt_ln) :
            print(f"  Found bad file -> {ef}")
            continue   
I need to get the first and the last lines from each file, which I'll use later in the script if you wondering why I'm going this way.

Thank you.
Reply
#6
Your are printing out 14 MB files? Or is that just example code? If it is just example code, what kind of processing are you doing?
Reply
#7
No, I'm not printing each line of the file Wink
I'm searching for specific lines. It just happened that some of the files I cannot open.
I started looking for a way to abort the search of a file if it is 'can't be open/read" or I can't print each line...

Thank you.
Reply
#8
What makes you think you cannot open some of the files?
Reply
#9
When I found the first file that the script failed on, I tried to open it with the NotePad and NotePad++
Notepad had blank lines, NotePad++ had a line "NULNULNULNUL..."
I created a script to collect files that the script failes to read a first line and I found the NUL character is actually a '\0'.
It appears "\0" is a unicode character.
I added an if statement to my script:

if re.search("\0\0\0", mysring) :
    break
Also, I'm printing the file that failes and the line that failes to a file so I can debug the script...
I also found that the "NULNULNUL" line can happen anywhere in the file

I appreciate your help!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Sad problems with reading csv file. MassiJames 3 619 Nov-16-2023, 03:41 PM
Last Post: snippsat
  Reading a file name fron a folder on my desktop Fiona 4 898 Aug-23-2023, 11:11 AM
Last Post: Axel_Erfurt
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,089 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Failing regex, space before and after the "match" tester_V 6 1,180 Mar-06-2023, 03:03 PM
Last Post: deanhystad
  Reading a file JonWayn 3 1,093 Dec-30-2022, 10:18 AM
Last Post: ibreeden
  Reading Specific Rows In a CSV File finndude 3 981 Dec-13-2022, 03:19 PM
Last Post: finndude
  Excel file reading problem max70990 1 891 Dec-11-2022, 07:00 PM
Last Post: deanhystad
  Replace columns indexes reading a XSLX file Larry1888 2 977 Nov-18-2022, 10:16 PM
Last Post: Pedroski55
  Failing to print sorted files tester_V 4 1,245 Nov-12-2022, 06:49 PM
Last Post: tester_V
  python difference between sys.exit and exit() mg24 1 1,823 Nov-12-2022, 01:37 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020