Python Forum

Full Version: how json dump Japanese
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
help,please.

I am coding a spider-code to collect message on a Japanese website, storing data using json-module

however, default json dump the data with Unicode. cant see raw Japanese

I see one solution on the internet is
json.loads('このコーディネートのスタイリストについて',encoding='utf-8')
but, this keyword is not available in json dump!
json.dump('このコーディネートのスタイリストについて',fp,encoding='utf-8')
this code raise an error that cant decode the 'gbk'!

but i cant encode my data structure into Unicode.
how can I see raw Japanese in my json file?
import json
my_data = '{"some_japanese_text":"このコーディネートのスタイリストについて"}'

# load it in a dict
json_data = json.loads(my_data, encoding='utf-8')
print(type(json_data))

# print the sting
print(json.dumps(json_data, indent=2, ensure_ascii=False))

# or write it to foo.json
with open('foo.json', 'w') as jf:
    json.dump(json_data, jf, indent=2, ensure_ascii=False)
Output:
<class 'dict'> { "some_japanese_text": "このコーディネートのスタイリストについて" }
To JSON serialize Unicode or non-ASCII data as-is strings instead of \u escape sequence.

The json.dump() and json.dumps() has a ensure_ascii parameter. The ensure_ascii is by-default true so the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii=False, these characters will be output as-is.

json.dumps('このコーディネートのスタイリストについて', ensure_ascii=False)
.