Python Forum
how to decode UTF-8 in python 3
Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to decode UTF-8 in python 3
#1
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> Str.decode(encoding = 'UTF-8',errors = 'strict')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'Str' is not defined
>>> .decode(encoding = 'UTF-8',errors = 'strict')
File "<stdin>", line 1
.decode(encoding = 'UTF-8',errors = 'strict')
^
SyntaxError: invalid syntax
>>> Str ="123"
>>> Str.decode(encoding = 'UTF-8',errors = 'strict')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
Reply
#2
  • Use the method decode on bytes to decode to str (unicode)
  • Use the method encode on str to encode to bytes.
  • A str object does not have the method decode. It is already decoded.
  • A bytes object does not have the method encode. It is already encoded.

Then the standard complains:
  • don't use upper case in variable names, try not to use names like Str
  • post your code in code tags (BB-Code in the forum)
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
To add a little info to @DeaD_EyE post.
One of the biggest changes in in Python 3 was Unicode.
In Python 3 are strings(Unicode) by default.
Bytes and strings(Unicode) are totally separated in Python 3(can not be mixed together).
>>> s = b'hello '
>>> w = 'world'
>>> s + w
Traceback (most recent call last):
  File "<string>", line 428, in runcode
  File "<interactive input>", line 1, in <module>
TypeError: can't concat str to bytes

# Decode from bytes to string
>>> s.decode() + w
'hello world'

>>> # The same as
>>> s.decode('utf-8') + w
'hello world

# For last example
>>> japanese = "桜の花びらたち"
>>> japanese
'桜の花びらたち'
>>> type(japanese)
<class 'str'>
Bring in stuff in from outside world then most have a encoding to be string(Unicode) in Python 3.
If not give encoding when take stuff in will be Bytes(b'something') or give error.
UTF-8 is always the first choice to try and ideally "always" use.
In and out example.
# Write to disk
japanese = "桜の花びらたち"
with open('jap.txt', 'w', encoding='utf-8') as f_out:
    f_out.write(japanese)
 
# Read from disk
with open('jap.txt', encoding='utf-8') as f:
    print(f.read())
Output:
桜の花びらたち
Reply
#4
Perhaps there are some languages that the encoding have to be pointed out explicitly.
So one can think that this is more or less general rule. I am happy that Python can speak in my own language. Cool
There were some issues with Python 3 and Unicode in Windows but they are fixed. As I know.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Decode string ? JohnnyCoffee 1 790 Jan-11-2023, 12:29 AM
Last Post: bowlofred
  how to encode and decode same value absolut 2 2,280 Sep-08-2020, 09:46 AM
Last Post: TomToad
  python-resize-image unicode decode error Pedroski55 3 3,401 Apr-21-2020, 10:56 AM
Last Post: Pedroski55
  struct.decode() and '\0' deanhystad 1 3,144 Apr-09-2020, 04:13 PM
Last Post: TomToad
  Getting decode error. shankar 8 10,277 Sep-20-2019, 10:05 AM
Last Post: tinman
  charmap codec can't decode byte error with gzipped file in python bluethundr 2 3,668 Apr-30-2019, 12:26 PM
Last Post: bluethundr
  decode base64 with python give error thailq 3 3,829 Sep-24-2018, 12:39 AM
Last Post: thailq
  python charmap codec can't decode byte X in position Y character maps to < undefined> owais 9 38,911 Apr-28-2018, 10:52 PM
Last Post: abadawi
  Ask help for utf-8 decode/encode forfan 12 10,733 Feb-25-2017, 02:04 AM
Last Post: forfan

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020