Apr-10-2018, 07:05 AM
Pages: 1 2
Apr-10-2018, 08:16 AM
You're perhaps trying to decode a string which was not encoded in the utf8 encoding. What do you know about this string? You can try other encodings such as iso9959-1 or cp1252. The chardet module can help you guess the string's encoding.
Apr-10-2018, 09:47 AM
hi, I have dowloaded it from my SAS EG, it is in txt format, when I read the first 100 rows it works, but when try the first 100000 it rises the error
Rep_Date Item_Name Item_Catalog_name Warehouse Qty Rep_item 0 2016-02-01 ALO Pre A RSC 13 Sales 1 2016-02-01 ALO Pre B RSC 3 Sales 2 2016-02-01 ALo Pre C RSC 2 Sales 3 2016-02-01 ALO Pre D RSC 13 Sales 4 2016-02-01 ALO Pre F RSC 9 Sales
Apr-10-2018, 10:00 AM
By bisection, it should be easy to find the first faulty line in the file. Try with the first 50000 lines etc until you can locate and print the faulty line.
Apr-10-2018, 10:32 AM
it says that it is in possition 31
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 31: invalid start byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 31: invalid start byte
Apr-10-2018, 11:40 AM
(Apr-10-2018, 10:32 AM)garikhgh0 Wrote: [ -> ]it says that it is in possition 31Position 31 in which string? The whole text file or some current line during the reading of the file? Can you post the complete exception traceback?
Apr-10-2018, 12:01 PM
Traceback (most recent call last): File "pandas\_libs\parsers.pyx", line 1175, in pandas._libs.parsers.TextReader._convert_tokens File "pandas\_libs\parsers.pyx", line 1281, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas\_libs\parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert File "pandas\_libs\parsers.pyx", line 1539, in pandas._libs.parsers._string_box_utf8 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 31: invalid start byte During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\garhakobyan\Desktop\My Project_on_Python\reading_675.py", line 4, in <module> df = pd.read_csv('R675.csv', encoding = 'utf_8') File "C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 709, in parser_f return _read(filepath_or_buffer, kwds) File "C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 455, in _read data = parser.read(nrows) File "C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1069, in read ret = self._engine.read(nrows) File "C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1839, in read data = self._reader.read(nrows) File "pandas\_libs\parsers.pyx", line 902, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 1001, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 1130, in pandas._libs.parsers.TextReader._convert_column_data File "pandas\_libs\parsers.pyx", line 1182, in pandas._libs.parsers.TextReader._convert_tokens File "pandas\_libs\parsers.pyx", line 1281, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas\_libs\parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert File "pandas\_libs\parsers.pyx", line 1539, in pandas._libs.parsers._string_box_utf8 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 31: invalid start byte [Finished in 1.5s with exit code 1] [shell_cmd: python -u "C:\Users\garhakobyan\Desktop\My Project_on_Python\reading_675.py"] [dir: C:\Users\garhakobyan\Desktop\My Project_on_Python] [path: C:\ProgramData\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\SAS\SharedFiles(32)\Formats;C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\Scripts\;C:\Users\garhakobyan\AppData\Local\Programs\Python\Python36-32\]
Apr-10-2018, 12:22 PM
You can track the error by running this code (you may need to change the path to the csv file)
# foo.py CSVFILE = 'R675.csv' with open(CSVFILEi, 'rb') as ifh: for i, line in enumerate(ifh, 1): try: s = line.decode('utf-8') except UnicodeDecodeError as err: print('R675.csv: ERROR AT LINE', i, repr(line)) break
Apr-10-2018, 12:40 PM
You should mention that you use pandas.
Do read it as
Post your code with a sample of CSV where error is.
Do read it as
utf-8
?Post your code with a sample of CSV where error is.
import pandas as pd df = pd.read_csv('file_name.csv', encoding='utf-8')Same with code @Grib has posted,it's an option to set encoding.
with open(CSVFILE, encoding='utf-8') as ifh:There can also ignore or replace error.
with open(CSVFILE, encoding='utf-8', errors='ignore') as ifh:
Apr-10-2018, 01:02 PM
it gave this
R675.csv: ERROR AT LINE 10538 b'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger \xffBlack,1,Sales\r\n'
it worked, I have cleared white spaces between "Charger" and "Black". but could not understand how it can affect to the reading process when there were nothing.
id there a method to show the hidden symbols, which cause the problem.
thanks a lot :)
R675.csv: ERROR AT LINE 10538 b'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger \xffBlack,1,Sales\r\n'
it worked, I have cleared white spaces between "Charger" and "Black". but could not understand how it can affect to the reading process when there were nothing.
id there a method to show the hidden symbols, which cause the problem.
thanks a lot :)
Pages: 1 2