Posts: 7,320
Threads: 123
Joined: Sep 2016
Then you have to check,if really is using Python 3.
# Python 3.7
>>> s = '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
>>> print(s.encode('latin1').decode('utf8'))
Άκης Τσιάμης Python 2 guess encoding(which also can be bad),but work here.
>>> # Python 2.7
>>> s = '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
>>> print(s.decode('utf8'))
Άκης Τσιάμης
Posts: 76
Threads: 21
Joined: Jan 2017
Feb-15-2019, 02:23 PM
(This post was last modified: Feb-15-2019, 02:25 PM by nikos.)
and if instead of items i say "for i in names:" i get
Output: UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)')
It shows the name correctly but with an error too!
Its indeed Python3 as it is shown here:
Output: Python 3.6.7 (default, Dec 5 2018, 15:02:05)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
>>> print(s.encode('latin1').decode('utf8'))
Άκης Τσιάμης
>>>
Posts: 76
Threads: 21
Joined: Jan 2017
Hello, what must i do next?
Posts: 7,320
Threads: 123
Joined: Sep 2016
Feb-15-2019, 05:43 PM
(This post was last modified: Feb-15-2019, 05:46 PM by snippsat.)
Check your environment,why are you using wsgi script,i would never do this as eg Flask(build on top of wsgi) that's make all simpler.
If get output as you show,then what's the problem to decode it?
items = ['', 'Alexander Lepsveridze', 'John Comeau', '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82', '\xce\x8c\xce\xbc\xce\xb9\xce\xbb\xce\xbf\xcf\x82 \xce\xa4\xcf\x83\xce\xbf\xcf\x84\xcf\x85\xce\xbb\xce\xaf\xce\xbf\xcf\x85']
names = [s.encode('latin1').decode('utf8') for s in items]
print(names[1:]) Output: ['Alexander Lepsveridze', 'John Comeau', 'Άκης Τσιάμης', 'Όμιλος Τσοτυλίου']
Posts: 2,125
Threads: 11
Joined: May 2017
Feb-15-2019, 06:42 PM
(This post was last modified: Feb-15-2019, 06:42 PM by DeaD_EyE.)
I guess the strings are stored in latin1 encoding, but you are using utf8 as charset for the database.
English text is based on ASCII and ASCII maps directly to latan1 and utf8.
Try to switch your encoding of the database to latin1. Do this only with a backup.
The other way could be to load all strings from the database decode them with latin1 (the original encoding), encode them to utf8 and write back to the database. Maybe you can do this with a mysqldump.
It's better when everything is right in your database, instead working around on application level.
I am not very familiar with mysql databases. In the past latin1 was the default standard setting.
The switch to utf8 was hard and many had issues with this. I guess if you google for 'mysql latin1 broken', you'll get many results.
Posts: 76
Threads: 21
Joined: Jan 2017
Feb-15-2019, 07:07 PM
(This post was last modified: Feb-15-2019, 07:33 PM by nikos.)
(Feb-15-2019, 05:43 PM)snippsat Wrote: Check your environment,why are you using wsgi script,i would never do this as eg Flask(build on top of wsgi) that's make all simpler.
If get output as you show,then what's the problem to decode it?
[/output]
From console everything works normally
Output: root@superhost wsgi]# python3
Python 3.6.7 (default, Dec 5 2018, 15:02:05)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> items = ['', 'Alexander Lepsveridze', 'John Comeau', '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82', '\xce\x8c\xce\xbc\xce\xb9\xce\xbb\xce\xbf\xcf\x82 \xce\xa4\xcf\x83\xce\xbf\xcf\x84\xcf\x85\xce\xbb\xce\xaf\xce\xbf\xcf\x85']
>>> names = [s.encode('latin1').decode('utf8') for s in items]
>>> print(names[1:])
['Alexander Lepsveridze', 'John Comeau', 'Άκης Τσιάμης', 'Όμιλος Τσοτυλίου']
>>>
While from withing the python3 script:
Output: [Fri Feb 15 21:05:15.306351 2019] [wsgi:error] [pid 16432] [remote 176.92.27.182:10408] ['Alexander Lepsveridze', 'John Comeau', '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82', '\xce\x8c\xce\xbc\xce\xb9\xce\xbb\xce\xbf\xcf\x82 \xce\xa4\xcf\x83\xce\xbf\xcf\x84\xcf\x85\xce\xbb\xce\xaf\xce\xbf\xcf\x85']
Why does it behave differently? From console it is able to encode/decode while from withing the script it cannot?!
(Feb-15-2019, 06:42 PM)DeaD_EyE Wrote: Try to switch your encoding of the database to latin1. Do this only with a backup.
I just did, but the result remain the same, i get the same weird encoding when i try to print names.
ps. Does anyone of you want access to my system to maybe figure out whats wrong?
Posts: 76
Threads: 21
Joined: Jan 2017
Hello, what would the next step be?
Posts: 7,320
Threads: 123
Joined: Sep 2016
(Feb-15-2019, 07:07 PM)nikos Wrote: While from withing the python3 script: Post code of script,and you most tell what you are using is it pure WSGI/mod_wsgi or a frame-work?
Output: [Fri Feb 15 21:05:15.306351 2019] [wsgi:error] [pid 16432] [remote 176.92.27.182:10408] ['Alexander Lepsveridze', 'John Comeau', '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82', '\xce\x8c\xce\xbc\xce\xb9\xce\xbb\xce\xbf\xcf\x82 \xce\xa4\xcf\x83\xce\xbf\xcf\x84\xcf\x85\xce\xbb\xce\xaf\xce\xbf\xcf\x85']
There is no error here that i can see,it just show the raw output from DB.
I can get output in a variable then can encode/decode as show before in post.
Posts: 76
Threads: 21
Joined: Jan 2017
Feb-16-2019, 05:08 PM
(This post was last modified: Feb-16-2019, 05:08 PM by nikos.)
I'am using WSGI/mod_wsgi in my httpd.conf but my Pythonn script is utilizing Bottle Framework.
Quote:I can get output in a variable then can encode/decode as show before in post.
How you mean? Only in console it show the names correctly? From script it show raw output from the db where when we try to encode/decode like folliwng:
names = [s.encode('latin1').decode('utf8') for s in names] it errors out:
Output: UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)')
Posts: 7,320
Threads: 123
Joined: Sep 2016
what is the raw output of names ,before try to encode?
Where dos it comes from in script,can you post code?
Quote:I'am using WSGI/mod_wsgi in my httpd.conf
This is tools would not use at all voluntary.
To give a example of tool that i think is better.
For all local development use build in web-server in eg Flask or Django.
All of work/testing i do with web-development is with build in web-server of Flask.
Not often i have sometime i want to host(share with world).
If want to host(share with world) would use Gunicorn with NGINX(better and easier to setup than Apache).
Or server less with AWS Lambda.
Quote:lets you run code without provisioning or managing servers.
So with setup over never touch WSGI/mod_wsgi or httpd.conf of Apache.
Example setup How To Serve Flask Applications with Gunicorn and Nginx on Ubuntu 18.04.
|