UnicodeDecodeError: - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: UnicodeDecodeError: (/thread-9460.html) Pages:
1
2
|
UnicodeDecodeError: - garikhgh0 - Apr-10-2018 Hi. how to handle this? UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 16: invalid start byte. thanks a lot. RE: UnicodeDecodeError: - Gribouillis - Apr-10-2018 You're perhaps trying to decode a string which was not encoded in the utf8 encoding. What do you know about this string? You can try other encodings such as iso9959-1 or cp1252. The chardet module can help you guess the string's encoding. RE: UnicodeDecodeError: - garikhgh0 - Apr-10-2018 hi, I have dowloaded it from my SAS EG, it is in txt format, when I read the first 100 rows it works, but when try the first 100000 it rises the error Rep_Date Item_Name Item_Catalog_name Warehouse Qty Rep_item 0 2016-02-01 ALO Pre A RSC 13 Sales 1 2016-02-01 ALO Pre B RSC 3 Sales 2 2016-02-01 ALo Pre C RSC 2 Sales 3 2016-02-01 ALO Pre D RSC 13 Sales 4 2016-02-01 ALO Pre F RSC 9 Sales RE: UnicodeDecodeError: - Gribouillis - Apr-10-2018 By bisection, it should be easy to find the first faulty line in the file. Try with the first 50000 lines etc until you can locate and print the faulty line. RE: UnicodeDecodeError: - garikhgh0 - Apr-10-2018 it says that it is in possition 31 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 31: invalid start byte RE: UnicodeDecodeError: - Gribouillis - Apr-10-2018 (Apr-10-2018, 10:32 AM)garikhgh0 Wrote: it says that it is in possition 31Position 31 in which string? The whole text file or some current line during the reading of the file? Can you post the complete exception traceback? RE: UnicodeDecodeError: - garikhgh0 - Apr-10-2018 Traceback (most recent call last): File "pandas\_libs\parsers.pyx", line 1175, in pandas._libs.parsers.TextReader._convert_tokens File "pandas\_libs\parsers.pyx", line 1281, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas\_libs\parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert File "pandas\_libs\parsers.pyx", line 1539, in pandas._libs.parsers._string_box_utf8 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 31: invalid start byte During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\garhakobyan\Desktop\My Project_on_Python\reading_675.py", line 4, in <module> df = pd.read_csv('R675.csv', encoding = 'utf_8') File "C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 709, in parser_f return _read(filepath_or_buffer, kwds) File "C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 455, in _read data = parser.read(nrows) File "C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1069, in read ret = self._engine.read(nrows) File "C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1839, in read data = self._reader.read(nrows) File "pandas\_libs\parsers.pyx", line 902, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 1001, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 1130, in pandas._libs.parsers.TextReader._convert_column_data File "pandas\_libs\parsers.pyx", line 1182, in pandas._libs.parsers.TextReader._convert_tokens File "pandas\_libs\parsers.pyx", line 1281, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas\_libs\parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert File "pandas\_libs\parsers.pyx", line 1539, in pandas._libs.parsers._string_box_utf8 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 31: invalid start byte [Finished in 1.5s with exit code 1] [shell_cmd: python -u "C:\Users\garhakobyan\Desktop\My Project_on_Python\reading_675.py"] [dir: C:\Users\garhakobyan\Desktop\My Project_on_Python] [path: C:\ProgramData\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\SAS\SharedFiles(32)\Formats;C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\Scripts\;C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\] RE: UnicodeDecodeError: - Gribouillis - Apr-10-2018 You can track the error by running this code (you may need to change the path to the csv file) # foo.py CSVFILE = 'R675.csv' with open(CSVFILEi, 'rb') as ifh: for i, line in enumerate(ifh, 1): try: s = line.decode('utf-8') except UnicodeDecodeError as err: print('R675.csv: ERROR AT LINE', i, repr(line)) break RE: UnicodeDecodeError: - snippsat - Apr-10-2018 You should mention that you use pandas. Do read it as utf-8 ?Post your code with a sample of CSV where error is. import pandas as pd df = pd.read_csv('file_name.csv', encoding='utf-8')Same with code @Grib has posted,it's an option to set encoding. with open(CSVFILE, encoding='utf-8') as ifh:There can also ignore or replace error. with open(CSVFILE, encoding='utf-8', errors='ignore') as ifh: RE: UnicodeDecodeError: - garikhgh0 - Apr-10-2018 it gave this R675.csv: ERROR AT LINE 10538 b'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger \xffBlack,1,Sales\r\n' it worked, I have cleared white spaces between "Charger" and "Black". but could not understand how it can affect to the reading process when there were nothing. id there a method to show the hidden symbols, which cause the problem. thanks a lot :) |