Python Forum

Full Version: exception during iteration loop
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i open a file for reading and read it like:
i=open(ifn)
for line in i:
    ...
    ...
it reads over 352000 lines then gets a UnicodeDecodeError exception. i just want to skip that. if it were some statement in the loop body i would put in a try: and do except: pass. but this is the loop control itself. how can i skip the exception this?
i = open(ifn, errors='replace') # this will replace the character with '?' for example
More here: https://docs.python.org/3/library/functions.html#open

You may like 'backslashreplace'. I presume Wink
either replace as wavic suggests, or use proper codec

You can use: https://github.com/chardet/chardet
to detect (most of the time) the proper file codec
(Oct-23-2018, 07:37 AM)wavic Wrote: [ -> ]
i = open(ifn, errors='replace') # this will replace the character with '?' for example
More here: https://docs.python.org/3/library/functions.html#open

You may like 'backslashreplace'. I presume Wink
'backslashreplace' worked. now i want to add some code to detect those backslashes to skip those lines. i suspect the file is not properly encoded in UTF-8.

(Oct-23-2018, 08:10 AM)Larz60+ Wrote: [ -> ]either replace as wavic suggests, or use proper codec

You can use: https://github.com/chardet/chardet
to detect (most of the time) the proper file codec
it is supposed to be encoded in UTF-8. apparently it isn't. i just want to skip the lines that are not valid UTF-8.
Quote:i just want to skip the lines that are not valid UTF-8.
you can override decode This should work:
i = open(ifn, encoding="utf-8", errors="ignore")
You could also write a wrapper class that implements the iterator protocol that just ignores any error except StopIteration. That's probably the dirty ugly way to do it, though.

class LineSkipper:
    def __init__(self, iterable):
        self.iterable = iter(iterable)

    def __iter__(self):
        return self

    def __next__(self):
        while True:
            try:
                return next(self.iterable)
            except StopIteration:
                # re-raise the stopiteration, so the caller knows we've reached the end of the iterable
                raise
            except:
                # ignore any errors reading the line and skip it entirely
                pass

with open("spam.txt") as f:
    for line in LineSkipper(f):
        print(f"{line.strip()}")
i just want to keep this simple. the file is a list of every file (full path) that could be installed for every package in the repositories i have configured for my ubuntu system along with the package name it comes in. i populated a database with it so i can search by file name.