Python Forum
exception during iteration loop - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: exception during iteration loop (/thread-13605.html)



exception during iteration loop - Skaperen - Oct-23-2018

i open a file for reading and read it like:
i=open(ifn)
for line in i:
    ...
    ...
it reads over 352000 lines then gets a UnicodeDecodeError exception. i just want to skip that. if it were some statement in the loop body i would put in a try: and do except: pass. but this is the loop control itself. how can i skip the exception this?


RE: exception during iteration loop - wavic - Oct-23-2018

i = open(ifn, errors='replace') # this will replace the character with '?' for example
More here: https://docs.python.org/3/library/functions.html#open

You may like 'backslashreplace'. I presume Wink


RE: exception during iteration loop - Larz60+ - Oct-23-2018

either replace as wavic suggests, or use proper codec

You can use: https://github.com/chardet/chardet
to detect (most of the time) the proper file codec


RE: exception during iteration loop - Skaperen - Oct-23-2018

(Oct-23-2018, 07:37 AM)wavic Wrote:
i = open(ifn, errors='replace') # this will replace the character with '?' for example
More here: https://docs.python.org/3/library/functions.html#open

You may like 'backslashreplace'. I presume Wink
'backslashreplace' worked. now i want to add some code to detect those backslashes to skip those lines. i suspect the file is not properly encoded in UTF-8.

(Oct-23-2018, 08:10 AM)Larz60+ Wrote: either replace as wavic suggests, or use proper codec

You can use: https://github.com/chardet/chardet
to detect (most of the time) the proper file codec
it is supposed to be encoded in UTF-8. apparently it isn't. i just want to skip the lines that are not valid UTF-8.


RE: exception during iteration loop - Larz60+ - Oct-23-2018

Quote:i just want to skip the lines that are not valid UTF-8.
you can override decode This should work:
i = open(ifn, encoding="utf-8", errors="ignore")



RE: exception during iteration loop - nilamo - Oct-23-2018

You could also write a wrapper class that implements the iterator protocol that just ignores any error except StopIteration. That's probably the dirty ugly way to do it, though.

class LineSkipper:
    def __init__(self, iterable):
        self.iterable = iter(iterable)

    def __iter__(self):
        return self

    def __next__(self):
        while True:
            try:
                return next(self.iterable)
            except StopIteration:
                # re-raise the stopiteration, so the caller knows we've reached the end of the iterable
                raise
            except:
                # ignore any errors reading the line and skip it entirely
                pass

with open("spam.txt") as f:
    for line in LineSkipper(f):
        print(f"{line.strip()}")



RE: exception during iteration loop - Skaperen - Oct-24-2018

i just want to keep this simple. the file is a list of every file (full path) that could be installed for every package in the repositories i have configured for my ubuntu system along with the package name it comes in. i populated a database with it so i can search by file name.