Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
More non english characters
#1
Hi,

When I read from JSON, it's recognising the special characters however when using the write funciton, it has started falling over. Not sure why but all of a sudden as I introduced Asian countries into my API it's failing. Not sure what I need to convert this string to?

I have the likes of: Diósgyőr and fails on:
f.write('Diósgyőr')

Cheers,
J
Reply
#2
Your previous Thread
There talk about using json and not text when get data from API.
So if want to write as text file or something else use always utf-8 encoding when read and write.
with open('out.txt', 'w', encoding='utf-8') as f_out:
    s = 'Diósgyőr'
    f_out.write(s)

with open('out.txt', 'r', encoding='utf-8') as f:
    print(f.read())
Output:
Diósgyőr
Reply
#3
(Apr-17-2021, 11:48 AM)snippsat Wrote: Your previous Thread
There talk about using json and not text when get data from API.
So if want to write as text file or something else use always utf-8 encoding when read and write.
with open('out.txt', 'w', encoding='utf-8') as f_out:
    s = 'Diósgyőr'
    f_out.write(s)

with open('out.txt', 'r', encoding='utf-8') as f:
    print(f.read())
Output:
Diósgyőr

HI mate,

Nice one and indeed about that earlier post. I remember the json converting it magically and thought maybe I had to convert it back.

Many thanks for that!

Cheers,
J
Reply
#4
(Apr-17-2021, 03:13 PM)johnboy1974 Wrote:
(Apr-17-2021, 11:48 AM)snippsat Wrote: Your previous Thread
There talk about using json and not text when get data from API.
So if want to write as text file or something else use always utf-8 encoding when read and write.
with open('out.txt', 'w', encoding='utf-8') as f_out:
    s = 'Diósgyőr'
    f_out.write(s)

with open('out.txt', 'r', encoding='utf-8') as f:
    print(f.read())
Output:
Diósgyőr

HI mate,

Nice one and indeed about that earlier post. I remember the json converting it magically and thought maybe I had to convert it back.

Many thanks for that!

Cheers,
J


Hi,

Interestingly, I'm getting a different error.

JSON returns '1a Divisió'
File is writing: 1a Divisió

I've tried using utf-16 for no reason but it complained about a BOM.

response = requests.request("GET", url, headers=headers)
json_data = response.json()
myData = json_data['response']
.
.
.
f = open(strLeaguesFile, 'a', encoding='utf-8')
f.write(strLeagueName)
CHeers,
J
Reply
#5
You most be messing up something on your side,how do you get strLeagueName?
If i write 1a Divisió the output will be correct just the same as with Diósgyőr.

Moving to Python 3 so was better Unicode one of the biggest changes.
Since Python 3 all strings are stored as Unicode.
# Python 3.9
>>> s = '1a Divisió'
>>> s
'1a Divisió' 

>>> s = '異體字字'
>>> s
'異體字字'

# Python 2.7
>>> s = '1a Divisió'
>>> s
'1a Divisi\xa2'

>>> s = '異體字字'
>>> s
'????'
 
Reply
#6
(Apr-17-2021, 05:33 PM)snippsat Wrote: You most be messing up something on your side,how do you get strLeagueName?
If i write 1a Divisió the output will be correct just the same as with Diósgyőr.

Moving to Python 3 so was better Unicode one of the biggest changes.
Since Python 3 all strings are stored as Unicode.
# Python 3.9
>>> s = '1a Divisió'
>>> s
'1a Divisió' 

>>> s = '異體字字'
>>> s
'異體字字'

# Python 2.7
>>> s = '1a Divisió'
>>> s
'1a Divisi\xa2'

>>> s = '異體字字'
>>> s
'????'
 

Very odd. I'm running python 3.65 so not sure if that's the issue? I'm reluctant to upgrade as always a bit nervous. Reckon I should?

That string is generated by (left a line out last time):
response = requests.request("GET", url, headers=headers)
json_data = response.json()

intServiceCount += 1

myData = json_data['response']

for data_line in myData:

   strLeagueName = data_line['league']['name']
Reply
#7
Do print(repr(strLeagueName))
If get this it should work,if not try run from command line and not use Ide/editor.
>>> print(repr(strLeagueName))
'1a Divisió'
johnboy1974 Wrote:Very odd. I'm running python 3.65 so not sure if that's the issue? I'm reluctant to upgrade as always a bit nervous. Reckon I should?
Yes should upgrade.
Reply
#8
(Apr-17-2021, 07:24 PM)snippsat Wrote: Do print(repr(strLeagueName))
If get this it should work,if not try run from command line and not use Ide/editor.
>>> print(repr(strLeagueName))
'1a Divisió'
johnboy1974 Wrote:Very odd. I'm running python 3.65 so not sure if that's the issue? I'm reluctant to upgrade as always a bit nervous. Reckon I should?
Yes should upgrade.

Hi mate,

Apologies for such a late reply. I fell ill and just come back to life(ish). Righty-o, I've upgraded to latest python and also the latest version of PyCharm... Thanks again for your reply!! I've done some more testing and I'm seriously lost with this... It's an odd one.

Firstly, running from the command line produced the same results. Now for the weird part...

1) When I use 'encoding='utf-8' and have f = open(strTablesFile, 'a', encoding='utf-8'). I get the following:
Butrinti Sarandë gets written as Butrinti Sarandë
Qarabağ gets written as Qarabag


2) When I leave out the encoding and have: f = open(strTablesFile, 'a'). I get the following:
Butrinti Sarandë gets written as Butrinti Sarandë --> Works!!
Qarabağ gets written as UnicodeEncodeError: 'charmap' codec can't encode character '\u0131' in position 6: character maps to <undefined> --> Crashes. When I stick it in a try statement and print the string to the console, it prints as I'd expect it to:

So it's a bit of a mixed bag really and now I'm properly lost!! Qarabağ

Cheers!
Reply
#9
(Apr-23-2021, 01:44 PM)johnboy1974 Wrote: 2) When I leave out the encoding and have: f = open(strTablesFile, 'a'). I get the following:
Do not this as OS will guess in your example choose charmap codec
(Apr-23-2021, 01:44 PM)johnboy1974 Wrote: So it's a bit of a mixed bag really and now I'm properly lost!! Qarabağ
There is no way Python make a mistake if that's the correct input.
with open('out.txt', 'a', encoding='utf-8') as f_out:
    s = 'Qarabağ'
    f_out.write(f'{s}\n')

with open('out.txt', 'r', encoding='utf-8') as f:
    print(f.read())
So if run 5 -times this is what in out.txt.
Output:
Qarabağ Qarabağ Qarabağ Qarabağ Qarabağ
You should get same out running this code,so here we start with Qarabağ from within Python.

That's way i mention do print(repr(word_to_check )),
to see if it's actually Qarabağ that come into Python.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  (python) Can i get some help fixing a English to Morse translator? Pls AlexPython 7 1,539 Sep-12-2022, 02:55 AM
Last Post: AlexPython
  write a program which prints the day in english ben1122 10 3,912 Jul-25-2021, 05:55 PM
Last Post: ben1122
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,420 May-15-2020, 01:37 PM
Last Post: snippsat
  English interpretation of the following file handing snippet mortch 5 3,120 May-30-2019, 08:10 AM
Last Post: mortch
  TreeTagger : parameter file invalid : english.par Raph0909 0 2,780 Apr-12-2019, 12:12 PM
Last Post: Raph0909
  Cobol code to English like language/Identify ENDIF for correspoding IF in a string Venkat 6 4,138 Apr-12-2018, 01:05 PM
Last Post: buran
  how coding microphone receive non-English? TedHanaka 1 2,328 Feb-12-2018, 02:13 PM
Last Post: sparkz_alot

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020