Bottom Page

Thread Rating:
  • 2 Vote(s) - 2.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 python2 string formatting - old and new - different for unicode
replying to another thread, I noticed something strange. following script, running python 2.7

import forecastio

def main():
    api_key = "API KEY"
    lat = -31.967819
    lng = 115.87718
    forecast = forecastio.load_forecast(api_key, lat, lng) 
    by_day = forecast.daily()

    print "===========Daily Data========="
    print "Daily Summary: %s" %(by_day.summary)
    print "===========Daily Data========="
    print "Daily Summary: {}".format(by_day.summary)

if __name__ == "__main__":
and the result

===========Daily Data========= Daily Summary: Light rain on Friday through Monday, with temperatures falling to 15°C on Saturday. ===========Daily Data========= Traceback (most recent call last):  File "", line 53, in <module>    main()  File "", line 46, in main    print "Daily Summary: {}".format(by_day.summary) UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 68: ordinal not in range(128)
so old-style string formatting has no problem with unicode char, while new one - using format, raise error. The behavior still the same if explicitly specify {:s}. I find it odd that there is difference. Do I miss something?
From the docs:
Quote:Converting to Bytes

The opposite method of bytes.decode() is str.encode(), which returns a bytes representation of the Unicode string, encoded in the requested encoding.

The errors parameter is the same as the parameter of the decode() method but supports a few more possible handlers. As well as 'strict', 'ignore', and 'replace' (which in this case inserts a question mark instead of the unencodable character), there is also 'xmlcharrefreplace' (inserts an XML character reference), backslashreplace (inserts a \uNNNN escape sequence) and namereplace (inserts a \N{...} escape sequence).

The following example shows the different results:

>>> u = chr(40960) + 'abcd' + chr(1972)
>>> u.encode('utf-8')
>>> u.encode('ascii')  
Traceback (most recent call last):
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
 position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
>>> u.encode('ascii', 'replace')
>>> u.encode('ascii', 'xmlcharrefreplace')
>>> u.encode('ascii', 'backslashreplace')
>>> u.encode('ascii', 'namereplace')
b'\\N{YI SYLLABLE IT}abcd\\u07b4'

The low-level routines for registering and accessing the available encodings are found in the codecs module. Implementing new encodings also requires understanding the codecs module. However, the encoding and decoding functions returned by this module are usually more low-level than is comfortable, and writing new encodings is a specialized task, so the module won’t be covered in this HOWTO.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Unicode string index problem luoheng 6 165 Nov-23-2019, 03:04 PM
Last Post: luoheng
  String formatting difficulties mmk1995 3 344 Aug-09-2019, 11:18 AM
Last Post: wavic
  string formatting Uchikago 1 287 Jun-28-2019, 03:28 PM
Last Post: buran
  python2.7 executables thus the system python2.7 was erroring utility.execute()? vivekm 1 252 May-20-2019, 11:24 AM
Last Post: vivekm
  clean unicode string to contain only characters from some unicode blocks gmarcon 2 657 Nov-23-2018, 09:17 PM
Last Post: Gribouillis
  TypeError: not all arguments converted during string formatting RedSkeleton007 1 7,550 Jul-15-2018, 08:51 PM
Last Post: ichabod801
  formatting string and returning as geojson garynobles 12 2,096 Mar-06-2018, 05:02 PM
Last Post: garynobles
  Dynamic Formatting of String mikera1979 2 743 Feb-27-2018, 07:09 AM
Last Post: mikera1979
  byte string in python2 Skaperen 4 1,610 Nov-23-2017, 03:13 AM
Last Post: Skaperen
  TypeError: coercing to Unicode: need string or buffer, int found papampi 4 14,738 Oct-11-2017, 09:37 PM
Last Post: papampi

Forum Jump:

Users browsing this thread: 1 Guest(s)