exception during iteration loop - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: exception during iteration loop (/thread-13605.html) |
exception during iteration loop - Skaperen - Oct-23-2018 i open a file for reading and read it like: i=open(ifn) for line in i: ... ...it reads over 352000 lines then gets a UnicodeDecodeError exception. i just want to skip that. if it were some statement in the loop body i would put in a try: and do except: pass. but this is the loop control itself. how can i skip the exception this? RE: exception during iteration loop - wavic - Oct-23-2018 i = open(ifn, errors='replace') # this will replace the character with '?' for exampleMore here: https://docs.python.org/3/library/functions.html#open You may like 'backslashreplace'. I presume RE: exception during iteration loop - Larz60+ - Oct-23-2018 either replace as wavic suggests, or use proper codec You can use: https://github.com/chardet/chardet to detect (most of the time) the proper file codec RE: exception during iteration loop - Skaperen - Oct-23-2018 (Oct-23-2018, 07:37 AM)wavic Wrote:'backslashreplace' worked. now i want to add some code to detect those backslashes to skip those lines. i suspect the file is not properly encoded in UTF-8.i = open(ifn, errors='replace') # this will replace the character with '?' for exampleMore here: https://docs.python.org/3/library/functions.html#open (Oct-23-2018, 08:10 AM)Larz60+ Wrote: either replace as wavic suggests, or use proper codecit is supposed to be encoded in UTF-8. apparently it isn't. i just want to skip the lines that are not valid UTF-8. RE: exception during iteration loop - Larz60+ - Oct-23-2018 Quote:i just want to skip the lines that are not valid UTF-8.you can override decode This should work: i = open(ifn, encoding="utf-8", errors="ignore") RE: exception during iteration loop - nilamo - Oct-23-2018 You could also write a wrapper class that implements the iterator protocol that just ignores any error except StopIteration . That's probably the dirty ugly way to do it, though.class LineSkipper: def __init__(self, iterable): self.iterable = iter(iterable) def __iter__(self): return self def __next__(self): while True: try: return next(self.iterable) except StopIteration: # re-raise the stopiteration, so the caller knows we've reached the end of the iterable raise except: # ignore any errors reading the line and skip it entirely pass with open("spam.txt") as f: for line in LineSkipper(f): print(f"{line.strip()}") RE: exception during iteration loop - Skaperen - Oct-24-2018 i just want to keep this simple. the file is a list of every file (full path) that could be installed for every package in the repositories i have configured for my ubuntu system along with the package name it comes in. i populated a database with it so i can search by file name. |