i open a file for reading and read it like:
i=open(ifn)
for line in i:
...
...
it reads over 352000 lines then gets a
UnicodeDecodeError exception. i just want to skip that. if it were some statement in the loop body i would put in a try: and do except: pass. but this is the loop control itself. how can i skip the exception this?
(Oct-23-2018, 07:37 AM)wavic Wrote: [ -> ]i = open(ifn, errors='replace') # this will replace the character with '?' for example
More here: https://docs.python.org/3/library/functions.html#open
You may like 'backslashreplace'. I presume
'backslashreplace' worked. now i want to add some code to detect those backslashes to skip those lines. i suspect the file is not properly encoded in UTF-8.
(Oct-23-2018, 08:10 AM)Larz60+ Wrote: [ -> ]either replace as wavic suggests, or use proper codec
You can use: https://github.com/chardet/chardet
to detect (most of the time) the proper file codec
it is supposed to be encoded in UTF-8. apparently it isn't. i just want to skip the lines that are not valid UTF-8.
Quote:i just want to skip the lines that are not valid UTF-8.
you can override decode This should work:
i = open(ifn, encoding="utf-8", errors="ignore")
You could also write a wrapper class that implements the iterator protocol that just ignores any error except
StopIteration
. That's probably the dirty ugly way to do it, though.
class LineSkipper:
def __init__(self, iterable):
self.iterable = iter(iterable)
def __iter__(self):
return self
def __next__(self):
while True:
try:
return next(self.iterable)
except StopIteration:
# re-raise the stopiteration, so the caller knows we've reached the end of the iterable
raise
except:
# ignore any errors reading the line and skip it entirely
pass
with open("spam.txt") as f:
for line in LineSkipper(f):
print(f"{line.strip()}")
i just want to keep this simple. the file is a list of every file (full path) that could be installed for every package in the repositories i have configured for my ubuntu system along with the package name it comes in. i populated a database with it so i can search by file name.