Python Forum
Want a list utf8 formatted but bytestrings found
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Want a list utf8 formatted but bytestrings found
#11
Then you have to check,if really is using Python 3.
# Python 3.7
>>> s = '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
>>> print(s.encode('latin1').decode('utf8'))
Άκης Τσιάμης
Python 2 guess encoding(which also can be bad),but work here.
>>> # Python 2.7
>>> s = '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
>>> print(s.decode('utf8'))
Άκης Τσιάμης
Reply
#12
and if instead of items i say "for i in names:" i get

Output:
UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)')
It shows the name correctly but with an error too!

Its indeed Python3 as it is shown here:

Output:
Python 3.6.7 (default, Dec 5 2018, 15:02:05) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> s = '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82' >>> print(s.encode('latin1').decode('utf8')) Άκης Τσιάμης >>>
Reply
#13
Hello, what must i do next?
Reply
#14
Check your environment,why are you using wsgi script,i would never do this as eg Flask(build on top of wsgi) that's make all simpler.
If get output as you show,then what's the problem to decode it?
items = ['', 'Alexander Lepsveridze', 'John Comeau', '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82', '\xce\x8c\xce\xbc\xce\xb9\xce\xbb\xce\xbf\xcf\x82 \xce\xa4\xcf\x83\xce\xbf\xcf\x84\xcf\x85\xce\xbb\xce\xaf\xce\xbf\xcf\x85']
names = [s.encode('latin1').decode('utf8') for s in items]
print(names[1:])
Output:
['Alexander Lepsveridze', 'John Comeau', 'Άκης Τσιάμης', 'Όμιλος Τσοτυλίου']
Reply
#15
I guess the strings are stored in latin1 encoding, but you are using utf8 as charset for the database.
English text is based on ASCII and ASCII maps directly to latan1 and utf8.

Try to switch your encoding of the database to latin1. Do this only with a backup.
The other way could be to load all strings from the database decode them with latin1 (the original encoding), encode them to utf8 and write back to the database. Maybe you can do this with a mysqldump.

It's better when everything is right in your database, instead working around on application level.


I am not very familiar with mysql databases. In the past latin1 was the default standard setting.
The switch to utf8 was hard and many had issues with this. I guess if you google for 'mysql latin1 broken', you'll get many results.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#16
(Feb-15-2019, 05:43 PM)snippsat Wrote: Check your environment,why are you using wsgi script,i would never do this as eg Flask(build on top of wsgi) that's make all simpler.
If get output as you show,then what's the problem to decode it?
[/output]

From console everything works normally
Output:
root@superhost wsgi]# python3 Python 3.6.7 (default, Dec 5 2018, 15:02:05) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> items = ['', 'Alexander Lepsveridze', 'John Comeau', '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82', '\xce\x8c\xce\xbc\xce\xb9\xce\xbb\xce\xbf\xcf\x82 \xce\xa4\xcf\x83\xce\xbf\xcf\x84\xcf\x85\xce\xbb\xce\xaf\xce\xbf\xcf\x85'] >>> names = [s.encode('latin1').decode('utf8') for s in items] >>> print(names[1:]) ['Alexander Lepsveridze', 'John Comeau', 'Άκης Τσιάμης', 'Όμιλος Τσοτυλίου'] >>>
While from withing the python3 script:

Output:
[Fri Feb 15 21:05:15.306351 2019] [wsgi:error] [pid 16432] [remote 176.92.27.182:10408] ['Alexander Lepsveridze', 'John Comeau', '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82', '\xce\x8c\xce\xbc\xce\xb9\xce\xbb\xce\xbf\xcf\x82 \xce\xa4\xcf\x83\xce\xbf\xcf\x84\xcf\x85\xce\xbb\xce\xaf\xce\xbf\xcf\x85']
Why does it behave differently? From console it is able to encode/decode while from withing the script it cannot?!

(Feb-15-2019, 06:42 PM)DeaD_EyE Wrote: Try to switch your encoding of the database to latin1. Do this only with a backup.

I just did, but the result remain the same, i get the same weird encoding when i try to print names.

ps. Does anyone of you want access to my system to maybe figure out whats wrong?
Reply
#17
Hello, what would the next step be?
Reply
#18
(Feb-15-2019, 07:07 PM)nikos Wrote: While from withing the python3 script:
Post code of script,and you most tell what you are using is it pure WSGI/mod_wsgi or a frame-work?
Output:
[Fri Feb 15 21:05:15.306351 2019] [wsgi:error] [pid 16432] [remote 176.92.27.182:10408] ['Alexander Lepsveridze', 'John Comeau', '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82', '\xce\x8c\xce\xbc\xce\xb9\xce\xbb\xce\xbf\xcf\x82 \xce\xa4\xcf\x83\xce\xbf\xcf\x84\xcf\x85\xce\xbb\xce\xaf\xce\xbf\xcf\x85']
There is no error here that i can see,it just show the raw output from DB.
I can get output in a variable then can encode/decode as show before in post.
Reply
#19
I'am using WSGI/mod_wsgi in my httpd.conf but my Pythonn script is utilizing Bottle Framework.

Quote:I can get output in a variable then can encode/decode as show before in post.
How you mean? Only in console it show the names correctly? From script it show raw output from the db where when we try to encode/decode like folliwng:

names = [s.encode('latin1').decode('utf8') for s in names]
it errors out:
Output:
UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)')
Reply
#20
what is the raw output of names,before try to encode?
Where dos it comes from in script,can you post code?
Quote:I'am using WSGI/mod_wsgi in my httpd.conf
This is tools would not use at all voluntary.
To give a example of tool that i think is better.
For all local development use build in web-server in eg Flask or Django.
All of work/testing i do with web-development is with build in web-server of Flask.
Not often i have sometime i want to host(share with world).

If want to host(share with world) would use Gunicorn with NGINX(better and easier to setup than Apache).
Or server less with AWS Lambda.
Quote:lets you run code without provisioning or managing servers.
So with setup over never touch WSGI/mod_wsgi or httpd.conf of Apache.
Example setup How To Serve Flask Applications with Gunicorn and Nginx on Ubuntu 18.04.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [SOLVED] [Windows] Converting filename to UTF8? Winfried 5 2,561 Sep-06-2022, 10:47 PM
Last Post: snippsat
  Formatted string not translated by gettext YvanM 10 2,007 Sep-02-2022, 08:46 PM
Last Post: YvanM
  Split string using variable found in a list japo85 2 1,302 Jul-11-2022, 08:52 AM
Last Post: japo85
  How can I found how many numbers are there in a Collatz Sequence that I found? cananb 2 2,551 Nov-23-2020, 05:15 PM
Last Post: cananb
  How to run a method on an argument in a formatted string Exsul 1 1,682 Aug-30-2019, 01:57 AM
Last Post: Exsul
  How work with formatted text in Python? AlekseyPython 3 2,824 Mar-18-2019, 05:00 AM
Last Post: AlekseyPython
  Who converts data when writing to a database with an encoding different from utf8? AlekseyPython 1 2,375 Mar-04-2019, 08:26 AM
Last Post: DeaD_EyE
  modify line in file if pattern found in list. kttan 1 2,227 Dec-10-2018, 08:45 AM
Last Post: Gribouillis
  How to detect and tell user that no matches were found in a list RedSkeleton007 6 3,894 Jul-19-2018, 06:27 PM
Last Post: woooee
  How can I write formatted (i.e. bold, italic, change font size, etc.) text to a file? JohnJSal 6 24,075 Jun-19-2018, 03:43 PM
Last Post: JohnJSal

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020