Python Forum
Does str type support multibyte characters?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Does str type support multibyte characters?
#1
I.e. the first character in a string var is multibyte, will var[0] return the character, or a code stored in the first byte?
Reply
#2
Did you make a test? Interactive console is great there.
Reply
#3
What version do you use Python 3(as you should use) has big changes in Unicode.
Byte and Unicode are totally separated in Python 3(can not be mixed together).
So in Python 3 are strings(Unicode) by default.
>>> japanese = "桜の花びらたち"
>>> japanese
'桜の花びらたち'
>>> japanese[0]
'桜'
>>> japanese[1]
'の'
Byte or multi-byte should be no concern of you,this is handled internally by Python.
PEP-393: Flexible String Representation
Quote:Python 3.3 switched to a new internal representation, using the most compact form needed to represent all characters in a string.
Either 1 byte, 2 bytes or 4 bytes are picked.
ASCII and Latin-1 text uses just 1 byte per character,
the rest of the BMP characters require 2 bytes and after that 4 bytes is used.
In and out of Python 3,then most always use a encoding and always use UTF-8.
If do that then get back the same sting and all working as shown over.
# Write to disk
japanese = "桜の花びらたち"
with open('jap.txt', 'w', encoding='utf-8') as f_out:
    f_out.write(japanese)

# Read from disk
with open('jap.txt', encoding='utf-8') as f:
    print(f.read())
Output:
桜の花びらたち
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,678 May-15-2020, 01:37 PM
Last Post: snippsat
  Type hinting - return type based on parameter micseydel 2 2,473 Jan-14-2020, 01:20 AM
Last Post: micseydel
  Regex: How to say 'any number of characters of any type until x'? JoeB 2 2,365 Jan-24-2018, 03:30 PM
Last Post: Mekire

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020