Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
UnicodeDecodeError:
#11
(Apr-10-2018, 01:02 PM)garikhgh0 Wrote: id there a method to show the hidden symbols, which cause the problem.
The problem comes from the fact that your program reads the file by assuming that it is a unicode file encoded in the utf8 encoding. If it is a file encoded differently, it doesn't work. In your case, it seems that the lines are ASCII characters (apart from this strange \xff). It is compatible with the utf8 encoding. There is no way to show the symbols that "cause the problem" because there are many different situations: if your file is written in japanese or korean for example, it may be encoded with a more suitable encoding and your efforts with utf8 will fail.
Reply
#12
That's fine if can go into file and fix it.
Here some way to get around problems.
>>> s = b'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger \xffBlack,1,Sales\r\n'
>>> s.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 68: invalid start byte
'utf-8' codec can't decode byte 0xff in position 68: invalid start byte

>>> # Try other codec
>>> s.decode('latin-1')
'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger ÿBlack,1,Sales\r\n'

>>> # Just ignore error
>>> s.decode('utf-8', errors='ignore')
'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger Black,1,Sales\r\n'

>>> # Replace with fill in(?) where error is
>>> s.decode('utf-8', errors='replace')
'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger �Black,1,Sales\r\n'
Reply
#13
thanks a lot
Reply
#14
Hi, friend.

If I found such errors in my file, how to handle them?

R675.csv: ERROR AT LINE 1009 b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
Reply
#15
(May-16-2018, 01:46 PM)garikhgh0 Wrote: If I found such errors in my file, how to handle them?
What are using and how do you read file?

\xa0 is non-breaking space in Latin1 (ISO 8859-1).
To read it with utf-8 you can just ignore error as posted before.
>>> s = b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
>>> print(s)
b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
>>> type(s)
<class 'bytes'>
>>> print(s.decode('utf-8', errors='ignore'))
"2016-02-01","Htech 6605 Wired selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"

>>> # latin-1 will work
>>> print(s.decode('latin-1'))
"2016-02-01","Htech 6605 Wired  selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"
Can do replace then decode utf-8.
>>> s = b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
>>> new_string = s.replace(b'\xa0', b'')
>>> print(new_string.decode('utf-8'))
"2016-02-01","Htech 6605 Wired selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"
Reply
#16
I have a similar error that came up.
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 214: character maps to <undefined>
This is what IDLE gave me at the line:
bytesToSend = f.read(1024)

Does anyone know how to fix this?
Reply
#17
I had the same issue...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 15: invalid continuation byte


HOW DO U FIX DIS???
Reply
#18
Hi. I had the same error. So, I overcame the error just using (encodeing = 'latin-1') when reading the file:


df = pd.read_csv('file.csv', encoding ='latin-1')
Reply
#19
If you don't know encoding - you can use chardet module to check it out; first read file content as bytesting, then check encoding - and either decode data or read the file anew

with open(<file>, 'rb') as infile:
    content = infile.read()
data = content.decode(chardet.detect(content)['encoding']
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  UnicodeDecodeError while installing polyglot draems 2 8,308 Feb-10-2017, 07:43 PM
Last Post: draems

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020