Posts: 9
Threads: 3
Joined: Apr 2021
Hi,
When I read from JSON, it's recognising the special characters however when using the write funciton, it has started falling over. Not sure why but all of a sudden as I introduced Asian countries into my API it's failing. Not sure what I need to convert this string to?
I have the likes of: Diósgyőr and fails on:
f.write('Diósgyőr')
Cheers,
J
Posts: 7,312
Threads: 123
Joined: Sep 2016
Your previous Thread
There talk about using json and not text when get data from API.
So if want to write as text file or something else use always utf-8 encoding when read and write.
with open('out.txt', 'w', encoding='utf-8') as f_out:
s = 'Diósgyőr'
f_out.write(s)
with open('out.txt', 'r', encoding='utf-8') as f:
print(f.read()) Output: Diósgyőr
Posts: 9
Threads: 3
Joined: Apr 2021
(Apr-17-2021, 11:48 AM)snippsat Wrote: Your previous Thread
There talk about using json and not text when get data from API.
So if want to write as text file or something else use always utf-8 encoding when read and write.
with open('out.txt', 'w', encoding='utf-8') as f_out:
s = 'Diósgyőr'
f_out.write(s)
with open('out.txt', 'r', encoding='utf-8') as f:
print(f.read()) Output: Diósgyőr
HI mate,
Nice one and indeed about that earlier post. I remember the json converting it magically and thought maybe I had to convert it back.
Many thanks for that!
Cheers,
J
Posts: 9
Threads: 3
Joined: Apr 2021
(Apr-17-2021, 03:13 PM)johnboy1974 Wrote: (Apr-17-2021, 11:48 AM)snippsat Wrote: Your previous Thread
There talk about using json and not text when get data from API.
So if want to write as text file or something else use always utf-8 encoding when read and write.
with open('out.txt', 'w', encoding='utf-8') as f_out:
s = 'Diósgyőr'
f_out.write(s)
with open('out.txt', 'r', encoding='utf-8') as f:
print(f.read()) Output: Diósgyőr
HI mate,
Nice one and indeed about that earlier post. I remember the json converting it magically and thought maybe I had to convert it back.
Many thanks for that!
Cheers,
J
Hi,
Interestingly, I'm getting a different error.
JSON returns '1a Divisió'
File is writing: 1a Divisió
I've tried using utf-16 for no reason but it complained about a BOM.
response = requests.request("GET", url, headers=headers)
json_data = response.json()
myData = json_data['response']
.
.
.
f = open(strLeaguesFile, 'a', encoding='utf-8')
f.write(strLeagueName) CHeers,
J
Posts: 7,312
Threads: 123
Joined: Sep 2016
Apr-17-2021, 05:33 PM
(This post was last modified: Apr-17-2021, 05:33 PM by snippsat.)
You most be messing up something on your side,how do you get strLeagueName ?
If i write 1a Divisió the output will be correct just the same as with Diósgyőr .
Moving to Python 3 so was better Unicode one of the biggest changes.
Since Python 3 all strings are stored as Unicode.
# Python 3.9
>>> s = '1a Divisió'
>>> s
'1a Divisió'
>>> s = '異體字字'
>>> s
'異體字字'
# Python 2.7
>>> s = '1a Divisió'
>>> s
'1a Divisi\xa2'
>>> s = '異體字字'
>>> s
'????'
Posts: 9
Threads: 3
Joined: Apr 2021
(Apr-17-2021, 05:33 PM)snippsat Wrote: You most be messing up something on your side,how do you get strLeagueName ?
If i write 1a Divisió the output will be correct just the same as with Diósgyőr .
Moving to Python 3 so was better Unicode one of the biggest changes.
Since Python 3 all strings are stored as Unicode.
# Python 3.9
>>> s = '1a Divisió'
>>> s
'1a Divisió'
>>> s = '異體字字'
>>> s
'異體字字'
# Python 2.7
>>> s = '1a Divisió'
>>> s
'1a Divisi\xa2'
>>> s = '異體字字'
>>> s
'????'
Very odd. I'm running python 3.65 so not sure if that's the issue? I'm reluctant to upgrade as always a bit nervous. Reckon I should?
That string is generated by (left a line out last time):
response = requests.request("GET", url, headers=headers)
json_data = response.json()
intServiceCount += 1
myData = json_data['response']
for data_line in myData:
strLeagueName = data_line['league']['name']
Posts: 7,312
Threads: 123
Joined: Sep 2016
Do print(repr(strLeagueName))
If get this it should work,if not try run from command line and not use Ide/editor.
>>> print(repr(strLeagueName))
'1a Divisió' johnboy1974 Wrote:Very odd. I'm running python 3.65 so not sure if that's the issue? I'm reluctant to upgrade as always a bit nervous. Reckon I should? Yes should upgrade.
Posts: 9
Threads: 3
Joined: Apr 2021
(Apr-17-2021, 07:24 PM)snippsat Wrote: Do print(repr(strLeagueName))
If get this it should work,if not try run from command line and not use Ide/editor.
>>> print(repr(strLeagueName))
'1a Divisió' johnboy1974 Wrote:Very odd. I'm running python 3.65 so not sure if that's the issue? I'm reluctant to upgrade as always a bit nervous. Reckon I should? Yes should upgrade.
Hi mate,
Apologies for such a late reply. I fell ill and just come back to life(ish). Righty-o, I've upgraded to latest python and also the latest version of PyCharm... Thanks again for your reply!! I've done some more testing and I'm seriously lost with this... It's an odd one.
Firstly, running from the command line produced the same results. Now for the weird part...
1) When I use 'encoding='utf-8' and have f = open(strTablesFile, 'a', encoding='utf-8'). I get the following:
Butrinti Sarandë gets written as Butrinti Sarandë
Qarabağ gets written as Qarabag
2) When I leave out the encoding and have: f = open(strTablesFile, 'a'). I get the following:
Butrinti Sarandë gets written as Butrinti Sarandë --> Works!!
Qarabağ gets written as UnicodeEncodeError: 'charmap' codec can't encode character '\u0131' in position 6: character maps to <undefined> --> Crashes. When I stick it in a try statement and print the string to the console, it prints as I'd expect it to:
So it's a bit of a mixed bag really and now I'm properly lost!! Qarabağ
Cheers!
Posts: 7,312
Threads: 123
Joined: Sep 2016
(Apr-23-2021, 01:44 PM)johnboy1974 Wrote: 2) When I leave out the encoding and have: f = open(strTablesFile, 'a'). I get the following: Do not this as OS will guess in your example choose charmap codec (Apr-23-2021, 01:44 PM)johnboy1974 Wrote: So it's a bit of a mixed bag really and now I'm properly lost!! Qarabağ There is no way Python make a mistake if that's the correct input.
with open('out.txt', 'a', encoding='utf-8') as f_out:
s = 'Qarabağ'
f_out.write(f'{s}\n')
with open('out.txt', 'r', encoding='utf-8') as f:
print(f.read()) So if run 5 -times this is what in out.txt .
Output: Qarabağ
Qarabağ
Qarabağ
Qarabağ
Qarabağ
You should get same out running this code,so here we start with Qarabağ from within Python.
That's way i mention do print(repr(word_to_check )) ,
to see if it's actually Qarabağ that come into Python.
|