Python Forum
Who converts data when writing to a database with an encoding different from utf8? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Who converts data when writing to a database with an encoding different from utf8? (/thread-16534.html)



Who converts data when writing to a database with an encoding different from utf8? - AlekseyPython - Mar-04-2019

Python 3.7.2

I write the strings from my Python code into my database. My strings contain Latin and Cyrillic characters, so in the database I use 1-byte encoding koi8-r. The miracle is that my strings without distortion are written to the database, although utf8 and koi8r have completely different sequence of characters (for example, as in ascii and utf8). Sometimes characters of other layouts appear in the text and then write errors appear.

Therefore, the questions appear:
1. Who converts strings: the database or the aiomysql library, that I use to write to the database.
2. How quickly in Python / MariaDB to remove non-koi8-r characters to avoid errors.

Thank you in advance for participating in the conversation.

Perhaps, there are databases that support an "economical" (for my case) multibyte encoding, which stores the Latin and Cyrillic characters in the first byte, and other layouts in other bytes?


RE: Who converts data when writing to a database with an encoding different from utf8? - DeaD_EyE - Mar-04-2019

If I understood the code right, there happens an implicit encoding to utf8mb4, if no other default encoding has been set.
If I'm right, this means the encoding takes 4 byte for each character. This can be something for internal optimization of MySQL itself.

Where the encoding happens: https://github.com/aio-libs/aiomysql/blob/master/aiomysql/connection.py#L426
DEFAULT_CHARSET from external dependency: https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/connections.py#L91

If you have a str, it's already encoded internally with utf8.
If your input was made with a koi8-r encoding, somewhere must happen a conversion.
For example if you enter form data on a web page, you should receive the parameters as raw bytes.
Then they need to be decoded, to be represented as str.

If the query is a str, then it is automatically encoded to utf8mb4, if no other encoding has been set somewhere.