Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unicode problem
#4
(Feb-10-2020, 12:22 PM)Hobson Wrote: For various reasons I would prefer not to install anything that does not come as standard
This should nowadays not be a good reason at all,as pip come with Python an work in all kind of environments.

There is errors parameter that can use,this will work but get some missing character.
There is ignore or replace.
>>> s = 'Birleşik Krallık'
>>> s.encode('iso-8859-1', errors='ignore')
b'Birleik Krallk'
>>> 
>>> s.encode('iso-8859-1', errors='replace')
b'Birle?ik Krall?k'
Quote:If not then am I correct in thinking that if the UnicodeEncodeError occurs in your code I would need a second line of the type:
text = text.decode()
to convert text from bytes to a string?
Yes you only get in trouble because you try to encode to iso-8859-1,then it most be bytes.
To get back to sting most decode.
>>> s = 'hello'
>>> s = s.encode() #Same as encode('utf-8')
>>> s
b'hello'
>>> s.decode() #Same as decode('utf-8') 
'hello'

>>> s = 'hello'
>>> s = s.encode('iso-8859-1') # Give it a other encoding than utf-8
>>> s
b'hello'
>>> s.decode('iso-8859-1') #Or just decode() would work in this case 
'hello'
Only see the difference if there is a Unicode character
>> s = 'helloø'
>>> s = s.encode() 
>>> s
b'hello\xc3\xb8'
>>> 
>>> s = 'helloø'
>>> s = s.encode('iso-8859-1') 
>>> s
b'hello\xf8'

>>> s.decode() # Now utf-8 back will fail
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 5: invalid start byte

>>> s.decode('iso-8859-1') # Same encoding an it works
'helloø'
Reply


Messages In This Thread
Unicode problem - by Hobson - Feb-10-2020, 10:47 AM
RE: Unicode problem - by snippsat - Feb-10-2020, 11:28 AM
RE: Unicode problem - by Hobson - Feb-10-2020, 12:22 PM
RE: Unicode problem - by snippsat - Feb-10-2020, 01:51 PM
RE: Unicode problem - by Hobson - Feb-10-2020, 02:59 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Unicode string index problem luoheng 6 3,012 Nov-23-2019, 03:04 PM
Last Post: luoheng
  clean unicode string to contain only characters from some unicode blocks gmarcon 2 3,964 Nov-23-2018, 09:17 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020