python2 string formatting - old and new - different for unicode

**Larz60+** · (This post was last modified: May-16-2017, 10:14 PM by Larz60+.)

From the docs:

Quote:Converting to Bytes

The opposite method of bytes.decode() is str.encode(), which returns a bytes representation of the Unicode string, encoded in the requested encoding.

The errors parameter is the same as the parameter of the decode() method but supports a few more possible handlers. As well as 'strict', 'ignore', and 'replace' (which in this case inserts a question mark instead of the unencodable character), there is also 'xmlcharrefreplace' (inserts an XML character reference), backslashreplace (inserts a \uNNNN escape sequence) and namereplace (inserts a \N{...} escape sequence).

The following example shows the different results:
>>>

>>> u = chr(40960) + 'abcd' + chr(1972)
>>> u.encode('utf-8')
b'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')
Traceback (most recent call last):
...
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
b'abcd'
>>> u.encode('ascii', 'replace')
b'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
b'ꀀabcd޴'
>>> u.encode('ascii', 'backslashreplace')
b'\\ua000abcd\\u07b4'
>>> u.encode('ascii', 'namereplace')
b'\\N{YI SYLLABLE IT}abcd\\u07b4'

The low-level routines for registering and accessing the available encodings are found in the codecs module. Implementing new encodings also requires understanding the codecs module. However, the encoding and decoding functions returned by this module are usually more low-level than is comfortable, and writing new encodings is a specialized task, so the module won’t be covered in this HOWTO.

https://docs.python.org/3/howto/unicode.html

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Formatting a date time string read from a csv file	DosAtPython	5	1,253	Jun-19-2023, 02:12 PM Last Post: DosAtPython
	String formatting (strptime) issues	Henrio	2	839	Jan-06-2023, 06:57 PM Last Post: deanhystad
	confused about string formatting	barryjo	7	1,977	Mar-06-2022, 02:03 AM Last Post: snippsat
	string formatting	barryjo	7	2,039	Jan-02-2022, 02:08 AM Last Post: snippsat
	Help with string formatting in classes	brthurr	6	9,038	Dec-17-2021, 04:35 PM Last Post: Jeff900
	Question on HTML formatting with set string in message	Cknutson575	3	3,472	Mar-09-2021, 08:11 AM Last Post: Cknutson575
	Remove escape characters / Unicode characters from string	DreamingInsanity	5	13,684	May-15-2020, 01:37 PM Last Post: snippsat
	smtplib: string formatting not carrying over to email	ClassicalSoul	1	2,649	Apr-22-2020, 09:58 PM Last Post: bowlofred
	Unicode string index problem	luoheng	6	3,015	Nov-23-2019, 03:04 PM Last Post: luoheng
	Trying to run a python2 script	dagamer1991	3	2,535	Aug-12-2019, 12:33 PM Last Post: buran

python2 string formatting - old and new - different for unicode

User Panel Messages

Announcements