Python Forum
python2 string formatting - old and new - different for unicode
Thread Rating:
  • 2 Vote(s) - 2.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
python2 string formatting - old and new - different for unicode
#2
From the docs:
Quote:Converting to Bytes

The opposite method of bytes.decode() is str.encode(), which returns a bytes representation of the Unicode string, encoded in the requested encoding.

The errors parameter is the same as the parameter of the decode() method but supports a few more possible handlers. As well as 'strict', 'ignore', and 'replace' (which in this case inserts a question mark instead of the unencodable character), there is also 'xmlcharrefreplace' (inserts an XML character reference), backslashreplace (inserts a \uNNNN escape sequence) and namereplace (inserts a \N{...} escape sequence).

The following example shows the different results:
>>>

>>> u = chr(40960) + 'abcd' + chr(1972)
>>> u.encode('utf-8')
b'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')  
Traceback (most recent call last):
   ...
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
 position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
b'abcd'
>>> u.encode('ascii', 'replace')
b'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
b'ꀀabcd޴'
>>> u.encode('ascii', 'backslashreplace')
b'\\ua000abcd\\u07b4'
>>> u.encode('ascii', 'namereplace')
b'\\N{YI SYLLABLE IT}abcd\\u07b4'

The low-level routines for registering and accessing the available encodings are found in the codecs module. Implementing new encodings also requires understanding the codecs module. However, the encoding and decoding functions returned by this module are usually more low-level than is comfortable, and writing new encodings is a specialized task, so the module won’t be covered in this HOWTO.
https://docs.python.org/3/howto/unicode.html
Reply


Messages In This Thread
RE: python2 string formatting - old and new - different for unicode - by Larz60+ - May-16-2017, 10:13 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Formatting a date time string read from a csv file DosAtPython 5 1,253 Jun-19-2023, 02:12 PM
Last Post: DosAtPython
  String formatting (strptime) issues Henrio 2 839 Jan-06-2023, 06:57 PM
Last Post: deanhystad
  confused about string formatting barryjo 7 1,977 Mar-06-2022, 02:03 AM
Last Post: snippsat
  string formatting barryjo 7 2,039 Jan-02-2022, 02:08 AM
Last Post: snippsat
  Help with string formatting in classes brthurr 6 9,038 Dec-17-2021, 04:35 PM
Last Post: Jeff900
  Question on HTML formatting with set string in message Cknutson575 3 3,472 Mar-09-2021, 08:11 AM
Last Post: Cknutson575
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,684 May-15-2020, 01:37 PM
Last Post: snippsat
  smtplib: string formatting not carrying over to email ClassicalSoul 1 2,649 Apr-22-2020, 09:58 PM
Last Post: bowlofred
  Unicode string index problem luoheng 6 3,015 Nov-23-2019, 03:04 PM
Last Post: luoheng
  Trying to run a python2 script dagamer1991 3 2,535 Aug-12-2019, 12:33 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020