As mention and show bye @DeaD_EyE it's more about converting stuff,than show use of Unicode.
A little over board on name here @DeaD_EyE
Some converting :
in Python 3,
Important to think of getting getting Unicode in and out of Python 3
decode in and out:
If all fails use ftfy fixes Unicode that’s broken in various ways
An other tips is to always use Requests when reading from a website,
Requests give correct encoding back (urllib dos not that).
A little over board on name here @DeaD_EyE
str_to_hex_str_with_space

Some converting :
>>> code_points = (0x0041, 0x00F6, 0x0416, 0x20AC, 0x1D11E) >>> uni = [chr(i) for i in code_points] >>> uni ['A', 'ö', 'Ж', '€', '?'] >>> for c in ''.join(uni): print('U+{:04x}'.format(ord(c))) U+0041 U+00f6 U+0416 U+20ac U+1d11eTo show difference in Unicode between Python 2 and 3 is easy.
in Python 3,
str
represents a Unicode string.# Python 3.6 >>> s = '200€ and a ☂' >>> type(s) <class 'str'> >>> s '200€ and a ☂' >>> print(s) 200€ and a ☂
# Python 2.7 >>> s = '200€ and a ☂' >>> s '200\xe2\x82\xac and a \xe2\x98\x82' >>> print s 200€ and a ☂ # Have to decode to utf-8 >>> print s.decode('utf-8') 200€ and a
Important to think of getting getting Unicode in and out of Python 3
decode in and out:
with open('some_file', encoding='utf-8') as f: print(f.read()Has also parameter to taken a malformed encoded file like
errors='ignore'
errors='replace'
.with open('some_file', encoding='utf-8', errors='ignore') as f: print(f.read())An other option is read it as bytes
rb
and the try to convert.>>> ch = open('chinese.txt', 'rb').read() >>> type(ch) <class 'bytes'> >>> ch b'\xef\xbb\xbfhi\xe7\x8c\xab' >>> print(ch.decode('utf-8')) # is now a string(Unicode) in python 3 hi猫There also no need to utf-8 in
s.decode()
and s.encode()
it use utf-8 as default.If all fails use ftfy fixes Unicode that’s broken in various ways

An other tips is to always use Requests when reading from a website,
Requests give correct encoding back (urllib dos not that).