Hello guest, if you read this it means you are not registered. Click here to register in a few simple steps, you will enjoy all features of our Forum.
Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
UnicodeDecodeError:
#11
(Apr-10-2018, 01:02 PM)garikhgh0 Wrote: id there a method to show the hidden symbols, which cause the problem.
The problem comes from the fact that your program reads the file by assuming that it is a unicode file encoded in the utf8 encoding. If it is a file encoded differently, it doesn't work. In your case, it seems that the lines are ASCII characters (apart from this strange \xff). It is compatible with the utf8 encoding. There is no way to show the symbols that "cause the problem" because there are many different situations: if your file is written in japanese or korean for example, it may be encoded with a more suitable encoding and your efforts with utf8 will fail.
Quote
#12
That's fine if can go into file and fix it.
Here some way to get around problems.
>>> s = b'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger \xffBlack,1,Sales\r\n'
>>> s.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 68: invalid start byte
'utf-8' codec can't decode byte 0xff in position 68: invalid start byte

>>> # Try other codec
>>> s.decode('latin-1')
'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger ÿBlack,1,Sales\r\n'

>>> # Just ignore error
>>> s.decode('utf-8', errors='ignore')
'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger Black,1,Sales\r\n'

>>> # Replace with fill in(?) where error is
>>> s.decode('utf-8', errors='replace')
'2018-03-26,HQ Service Center,Handset,Samsung Galaxy S9+ and Charger �Black,1,Sales\r\n'
ljmetzger likes this post
Quote
#13
thanks a lot
Quote
#14
Hi, friend.

If I found such errors in my file, how to handle them?

R675.csv: ERROR AT LINE 1009 b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
Quote
#15
(May-16-2018, 01:46 PM)garikhgh0 Wrote: If I found such errors in my file, how to handle them?
What are using and how do you read file?

\xa0 is non-breaking space in Latin1 (ISO 8859-1).
To read it with utf-8 you can just ignore error as posted before.
>>> s = b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
>>> print(s)
b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
>>> type(s)
<class 'bytes'>
>>> print(s.decode('utf-8', errors='ignore'))
"2016-02-01","Htech 6605 Wired selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"

>>> # latin-1 will work
>>> print(s.decode('latin-1'))
"2016-02-01","Htech 6605 Wired  selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"
Can do replace then decode utf-8.
>>> s = b'"2016-02-01","Htech 6605 Wired\xa0 selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"\r\n'
>>> new_string = s.replace(b'\xa0', b'')
>>> print(new_string.decode('utf-8'))
"2016-02-01","Htech 6605 Wired selfie HV-BTM 11","Accessories","Dalma RSC","1","Sales"
garikhgh0 likes this post
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  UnicodeDecodeError while installing polyglot draems 2 2,249 Feb-10-2017, 07:43 PM
Last Post: draems

Forum Jump:


Users browsing this thread: 1 Guest(s)