UnicodeDecodeError:

**Gribouillis** · Apr-10-2018, 01:32 PM

(Apr-10-2018, 01:02 PM)garikhgh0 Wrote: id there a method to show the hidden symbols, which cause the problem.

The problem comes from the fact that your program reads the file by assuming that it is a unicode file encoded in the utf8 encoding. If it is a file encoded differently, it doesn't work. In your case, it seems that the lines are ASCII characters (apart from this strange \xff). It is compatible with the utf8 encoding. There is no way to show the symbols that "cause the problem" because there are many different situations: if your file is written in japanese or korean for example, it may be encoded with a more suitable encoding and your efforts with utf8 will fail.

***snippsat*** · (This post was last modified: Apr-10-2018, 01:51 PM by snippsat.)

That's fine if can go into file and fix it.
Here some way to get around problems.

>>> s = b'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger \xffBlack,1,Sales\r\n'
>>> s.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 68: invalid start byte
'utf-8' codec can't decode byte 0xff in position 68: invalid start byte

>>> # Try other codec
>>> s.decode('latin-1')
'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger ÿBlack,1,Sales\r\n'

>>> # Just ignore error
>>> s.decode('utf-8', errors='ignore')
'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger Black,1,Sales\r\n'

>>> # Replace with fill in(?) where error is
>>> s.decode('utf-8', errors='replace')
'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger �Black,1,Sales\r\n'

garikhgh0 · Apr-10-2018, 01:54 PM

thanks a lot

garikhgh0 · May-16-2018, 01:46 PM

Hi, friend.

If I found such errors in my file, how to handle them?

R675.csv: ERROR AT LINE 1009 b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'

***snippsat*** · May-16-2018, 03:09 PM

(May-16-2018, 01:46 PM)garikhgh0 Wrote: If I found such errors in my file, how to handle them?

What are using and how do you read file?

\xa0 is non-breaking space in Latin1 (ISO 8859-1).
To read it with utf-8 you can just ignore error as posted before.

>>> s = b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
>>> print(s)
b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
>>> type(s)
<class 'bytes'>
>>> print(s.decode('utf-8', errors='ignore'))
"2016-02-01","Htech 6605 Wired selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"

>>> # latin-1 will work
>>> print(s.decode('latin-1'))
"2016-02-01","Htech 6605 Wired  selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"

Can do replace then decode utf-8.

>>> s = b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
>>> new_string = s.replace(b'\xa0', b'')
>>> print(new_string.decode('utf-8'))
"2016-02-01","Htech 6605 Wired selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"

PythonGuy888 · May-31-2018, 03:35 PM

I have a similar error that came up.
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 214: character maps to <undefined>
This is what IDLE gave me at the line:
bytesToSend = f.read(1024)

Does anyone know how to fix this?

PythonGuy888 · May-31-2018, 05:09 PM

I had the same issue...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 15: invalid continuation byte

HOW DO U FIX DIS???

garikhgh0 · Jun-04-2018, 05:41 AM

Hi. I had the same error. So, I overcame the error just using (encodeing = 'latin-1') when reading the file:

df = pd.read_csv('file.csv', encoding ='latin-1')

volcano63 · Jun-04-2018, 08:41 PM

If you don't know encoding - you can use chardet module to check it out; first read file content as bytesting, then check encoding - and either decode data or read the file anew

with open(<file>, 'rb') as infile:
    content = infile.read()
data = content.decode(chardet.detect(content)['encoding']

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	UnicodeDecodeError while installing polyglot	draems	2	8,365	Feb-10-2017, 07:43 PM Last Post: draems

UnicodeDecodeError:

User Panel Messages

Announcements