Python Forum

Full Version: How to send unicode string encoded in utf-8 in http request in Python
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm trying to send a unicode string, which contains chinese characters, from a python program to a web server.

The code for this is:
import json
import requests

    #get json
    data = json.dumps({"Input": {"Row": [{"BusinessCard": "那是一句话"}]}}, ensure_ascii=False)
    #encode json to utf-8
    encoded_data = data.encode('utf-8')

    r ="", data=encoded_data,
                      headers={'Content-Type': 'application/json; charset=UTF-8'},
                      auth=requests.auth.HTTPBasicAuth("user", "password"))
The encoded_data variable:
b'{"Input": {"Row": [{"BusinessCard": "\xe9\x82\xa3\xe6\x98\xaf\xe4\xb8\x80\xe5\x8f\xa5\xe8\xaf\x9d"}]}}'
My problem is, if send this to the server, the server doesn't recognize the utf-8 encoded data as chinese characters. The server just sees '"\xe9\x82\xa3...' and can't handle them.

But I'm forced to encode the data to utf-8 or the post method will give me the this unicode error:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 37-41: Body ('那是一句话') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
If I send the request from Postman, it works fine. The postman request is similiar, there is only the header for basic authentication and the content type header.

Also, if I send from the python program a test request to a simple spring boot server with a rest controller, the controller/spring recognizes the encoded values right as chinese characters.

How can I send the json data in the body like Postman does, so that the server understands the chinese characters?

I searched a few hours but couldn't find any answer to a similar case like this.

I would be really thankful for some help.


Ok, got some help from Stack Overflow.

I set the ensure_ascii parameter to 'True' and it works.

But I would be thankful for some further explanation if someone knows more.

The documentation states: "If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is."

Why have servers no trouble to encode escaped unicode signs which have been encoded to utf-8 but trouble with none escape unicode signs which have been also been encode to utf-8?
Very Useful when we try to send data with Chinese Characters via REST API. Thank you for your post.