Python Forum
how to decode UTF-8 in python 3 - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: how to decode UTF-8 in python 3 (/thread-10756.html)



how to decode UTF-8 in python 3 - oco - Jun-05-2018

Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> Str.decode(encoding = 'UTF-8',errors = 'strict')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'Str' is not defined
>>> .decode(encoding = 'UTF-8',errors = 'strict')
File "<stdin>", line 1
.decode(encoding = 'UTF-8',errors = 'strict')
^
SyntaxError: invalid syntax
>>> Str ="123"
>>> Str.decode(encoding = 'UTF-8',errors = 'strict')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'


RE: how to decode UTF-8 in python 3 - DeaD_EyE - Jun-05-2018

  • Use the method decode on bytes to decode to str (unicode)
  • Use the method encode on str to encode to bytes.
  • A str object does not have the method decode. It is already decoded.
  • A bytes object does not have the method encode. It is already encoded.

Then the standard complains:
  • don't use upper case in variable names, try not to use names like Str
  • post your code in code tags (BB-Code in the forum)



RE: how to decode UTF-8 in python 3 - snippsat - Jun-05-2018

To add a little info to @DeaD_EyE post.
One of the biggest changes in in Python 3 was Unicode.
In Python 3 are strings(Unicode) by default.
Bytes and strings(Unicode) are totally separated in Python 3(can not be mixed together).
>>> s = b'hello '
>>> w = 'world'
>>> s + w
Traceback (most recent call last):
  File "<string>", line 428, in runcode
  File "<interactive input>", line 1, in <module>
TypeError: can't concat str to bytes

# Decode from bytes to string
>>> s.decode() + w
'hello world'

>>> # The same as
>>> s.decode('utf-8') + w
'hello world

# For last example
>>> japanese = "桜の花びらたち"
>>> japanese
'桜の花びらたち'
>>> type(japanese)
<class 'str'>
Bring in stuff in from outside world then most have a encoding to be string(Unicode) in Python 3.
If not give encoding when take stuff in will be Bytes(b'something') or give error.
UTF-8 is always the first choice to try and ideally "always" use.
In and out example.
# Write to disk
japanese = "桜の花びらたち"
with open('jap.txt', 'w', encoding='utf-8') as f_out:
    f_out.write(japanese)
 
# Read from disk
with open('jap.txt', encoding='utf-8') as f:
    print(f.read())
Output:
桜の花びらたち



RE: how to decode UTF-8 in python 3 - wavic - Jun-05-2018

Perhaps there are some languages that the encoding have to be pointed out explicitly.
So one can think that this is more or less general rule. I am happy that Python can speak in my own language. Cool
There were some issues with Python 3 and Unicode in Windows but they are fixed. As I know.