Posts: 4,647
Threads: 1,494
Joined: Sep 2016
Oct-23-2018, 06:54 AM
(This post was last modified: Oct-23-2018, 06:54 AM by Skaperen.)
i open a file for reading and read it like:
i=open(ifn)
for line in i:
...
...
it reads over 352000 lines then gets a
UnicodeDecodeError exception. i just want to skip that. if it were some statement in the loop body i would put in a try: and do except: pass. but this is the loop control itself. how can i skip the exception this?
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,647
Threads: 1,494
Joined: Sep 2016
Oct-23-2018, 08:26 PM
(This post was last modified: Oct-23-2018, 08:28 PM by Skaperen.)
(Oct-23-2018, 07:37 AM)wavic Wrote: i = open(ifn, errors='replace') # this will replace the character with '?' for example
More here: https://docs.python.org/3/library/functions.html#open
You may like 'backslashreplace'. I presume 
'backslashreplace' worked. now i want to add some code to detect those backslashes to skip those lines. i suspect the file is not properly encoded in UTF-8.
(Oct-23-2018, 08:10 AM)Larz60+ Wrote: either replace as wavic suggests, or use proper codec
You can use: https://github.com/chardet/chardet
to detect (most of the time) the proper file codec
it is supposed to be encoded in UTF-8. apparently it isn't. i just want to skip the lines that are not valid UTF-8.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,647
Threads: 1,494
Joined: Sep 2016
Oct-24-2018, 12:35 AM
(This post was last modified: Oct-24-2018, 12:36 AM by Skaperen.)
i just want to keep this simple. the file is a list of every file (full path) that could be installed for every package in the repositories i have configured for my ubuntu system along with the package name it comes in. i populated a database with it so i can search by file name.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.