Python Forum

Full Version: encode/decode to show correct country letters in a CTk combobox
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
hi.
I have a text file encoded utf-8
Reading it with i.e. notepad shows "sør-trøndelag"
Note the Norwegian character "ø"

Binary reading in Python, from txtfile utf-8 encoded, shows:
"115 195 184 114 45 116 114 195 184 110 100 101 108 97 103"

A CtkComboBox receives the content of txtfile above.
Its output (list content viewed) is:
"sør-trøndelag"
Decoding binary file using utf-8 shows correct letter ø in combobox:
sør-trøndelag

Decoding using ansi and latin-1, both shows:
sør-trøndelag

So,the utf-8 should be used here(?)

Which side of the combobox needs coding to get the proper strings shown in its list?

Attached is an image if text above mess up correct viewing.

Thank You in advance.

I tried to save the sør-trøndelag into a new ansi coded textfile and it shows the correct/expected result sør-trøndelag.
Should I encode the list sent to combobox into i.e ansi (donno what coding Python uses?)
edit 2: tried to input sør-trøndelag into combobox. It was shown in combobox without change.

[attachment=2518]
text = bytes([115, 195, 184, 114, 45, 116, 114, 195, 184, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))
Output:
sør-trøndelag sør-trøndelag
latin1 encoding is the wrong choice.

When I cut the string from the website and pasted in a text file. I got this for bytes:
[115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103]
When I run this:
text = bytes([115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))
Output:
sør-trøndelag Traceback (most recent call last): File "c:\...test.py", line 3, in <module> print(str(text.decode("utf8"))) ^^^^^^^^^^^^^^^^^^^ UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 1: invalid start byte
This time the string was encoded using latin encoding.

It is frustrating, but don't blame python or customtkinter. A lot of the blame has to go to Windows which doesn't really know what to do with extended characters. I think there is some code that spins a wheel to pick a random encoding. utf8 nearly always works.
(Sep-02-2023, 04:35 AM)deanhystad Wrote: [ -> ]
text = bytes([115, 195, 184, 114, 45, 116, 114, 195, 184, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))
Output:
sør-trøndelag sør-trøndelag
latin1 encoding is the wrong choice.

When I cut the string from the website and pasted in a text file. I got this for bytes:
[115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103]
When I run this:
text = bytes([115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))
Output:
sør-trøndelag Traceback (most recent call last): File "c:\...test.py", line 3, in <module> print(str(text.decode("utf8"))) ^^^^^^^^^^^^^^^^^^^ UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 1: invalid start byte
This time the string was encoded using latin encoding.

It is frustrating, but don't blame python or customtkinter. A lot of the blame has to go to Windows which doesn't really know what to do with extended characters. I think there is some code that spins a wheel to pick a random encoding. utf8 nearly always works.

hi, the issue is gone. UTF-8 wasnt set in windows10. Search on google found description of setting the checkbox: "howto set utf-8 system in win10"
thanks Wall