May-15-2020, 08:24 AM
(This post was last modified: May-15-2020, 08:43 AM by DreamingInsanity.)
(May-14-2020, 06:11 PM)Larz60+ Wrote: Please show what you've tried so far.
As I said, I have already tried combinations of .encode()/.decode(), with no luck.
My next idea was have a function which loops through the string until it finds a backslash. It would then replace that backslash, and that backslash only. (Every time it found a backslash it would just replace a single one of them). So that would mean, if it was an escaped Unicode id like this: \\u... , once it had gone through the function it would become: \u...
The issue with that is that the number of backslashes is inconsistent - there's one backslash before a quotation mark, but there's three in the html line breaks, meaning my function wouldn't work.
The most simple way, I think, is this:
chars = { "\u00a0", " ", #no break space "\u00fc", "ΓΌ", "...", "...", "...", "..." } for char in chars: if (char in json): json.replace(char, chars[char])Just replace the Unicode ids with their respective characters. I have to get all the Unicode characters which are likely to show up - this isn't too much of an issue since I know there isn't that many.
The reason I haven't done thus is that I'm not a big fan of hard-coding values because if you hard-code and a change happens, for instance the json changes, then nine times out of ten, it's going to break your program.
This may well have to be the route I have to take, however.
I'll also give a little more info on where this json comes from:
The json is stored in a JavaScript file, in a variable called 'ACTIVITY_DATA'
When I use requests to get the whole page, I start by replacing that string:
answer_json = requests.get(src).text.replace("ACTIVITY_DATA = ", "")
The json has some weird quotes in it that make it invalid: (highlighted red)
{"1":"[{.........}]"}
So I do some substringing to remove them:
new_answers = answer_json[:5] + answer_json[6:-2] + answer_json[-3:-2]
And then that leaves me with the json I have now, but still full of escape characters.
Never mind, ignore everything. Turns out I'm an idiot. The json is valid with those quotation marks. The only issue I was having was that it wouldn't format with those in there. But formatting doesn't even matter when you don't see the json anyway.
The reason I was getting errors when parsing in the json was because there's some characters in the json (maybe some of the no space breaks?) that the library doesn't like when you try and parse it as json.