Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
utf-8
#6
As mention and show bye @DeaD_EyE it's more about converting stuff,than show use of Unicode.
A little over board on name here @DeaD_EyE str_to_hex_str_with_space Dodgy

Some converting :
>>> code_points = (0x0041, 0x00F6, 0x0416, 0x20AC, 0x1D11E)
>>> uni = [chr(i) for i in code_points]
>>> uni
['A', 'ö', 'Ж', '€', '?']

>>> for c in ''.join(uni): print('U+{:04x}'.format(ord(c)))
U+0041
U+00f6
U+0416
U+20ac
U+1d11e
To show difference in Unicode between Python 2 and 3 is easy.
in Python 3, str represents a Unicode string.
# Python 3.6
>>> s = '200€ and a ☂'
>>> type(s)
<class 'str'>
>>> s
'200€ and a ☂'
>>> print(s)
200€ and a ☂
# Python 2.7
>>> s = '200€ and a ☂'
>>> s
'200\xe2\x82\xac and a \xe2\x98\x82'
>>> print s
200€ and a ☂
# Have to decode to utf-8
>>> print s.decode('utf-8')
200€ and a

Important to think of getting getting Unicode in and out of Python 3
decode in and out:
with open('some_file', encoding='utf-8') as f:
   print(f.read()
Has also parameter to taken a malformed encoded file like errors='ignore' errors='replace'.
with open('some_file', encoding='utf-8', errors='ignore') as f:
   print(f.read())
An other option is read it as bytes rb and the try to convert.
>>> ch = open('chinese.txt', 'rb').read()
>>> type(ch)
<class 'bytes'>
>>> ch
b'\xef\xbb\xbfhi\xe7\x8c\xab'
>>> print(ch.decode('utf-8')) # is now a string(Unicode) in python 3
hi猫
There also no need to utf-8 in s.decode() and s.encode() it use utf-8 as default.

If all fails use ftfy fixes Unicode that’s broken in various ways Wink
An other tips is to  always use Requests when reading from a website,
Requests give correct encoding back (urllib dos not that).
Reply


Messages In This Thread
utf-8 - by Skaperen - Jun-25-2017, 04:23 AM
RE: utf-8 - by DeaD_EyE - Jun-25-2017, 03:07 PM
RE: utf-8 - by Skaperen - Jun-26-2017, 03:04 AM
RE: utf-8 - by DeaD_EyE - Jun-26-2017, 03:50 AM
RE: utf-8 - by Skaperen - Jun-27-2017, 02:40 AM
RE: utf-8 - by snippsat - Jun-27-2017, 11:16 AM
RE: utf-8 - by DeaD_EyE - Jun-27-2017, 12:05 PM
RE: utf-8 - by Skaperen - Jun-29-2017, 04:19 AM
RE: utf-8 - by snippsat - Jun-29-2017, 05:01 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020