encode/decode to show correct country letters in a CTk combobox

janeik · (This post was last modified: Sep-02-2023, 02:03 AM by janeik.)

hi.
I have a text file encoded utf-8
Reading it with i.e. notepad shows "sør-trøndelag"
Note the Norwegian character "ø"

Binary reading in Python, from txtfile utf-8 encoded, shows:
"115 195 184 114 45 116 114 195 184 110 100 101 108 97 103"

A CtkComboBox receives the content of txtfile above.
Its output (list content viewed) is:
"sÃ¸r-trÃ¸ndelag"
Decoding binary file using utf-8 shows correct letter ø in combobox:
sør-trøndelag

Decoding using ansi and latin-1, both shows:
sÃ¸r-trÃ¸ndelag

So,the utf-8 should be used here(?)

Which side of the combobox needs coding to get the proper strings shown in its list?

Attached is an image if text above mess up correct viewing.

Thank You in advance.

I tried to save the sÃ¸r-trÃ¸ndelag into a new ansi coded textfile and it shows the correct/expected result sør-trøndelag.
Should I encode the list sent to combobox into i.e ansi (donno what coding Python uses?)
edit 2: tried to input sÃ¸r-trÃ¸ndelag into combobox. It was shown in combobox without change.

**deanhystad** · Sep-02-2023, 04:35 AM

text = bytes([115, 195, 184, 114, 45, 116, 114, 195, 184, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))

Output:sÃ¸r-trÃ¸ndelag
sør-trøndelag

latin1 encoding is the wrong choice.

When I cut the string from the website and pasted in a text file. I got this for bytes:

[115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103]

When I run this:

text = bytes([115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))

Output:sør-trøndelag
Traceback (most recent call last):
  File "c:\...test.py", line 3, in <module>
    print(str(text.decode("utf8")))
              ^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 1: invalid start byte

This time the string was encoded using latin encoding.

It is frustrating, but don't blame python or customtkinter. A lot of the blame has to go to Windows which doesn't really know what to do with extended characters. I think there is some code that spins a wheel to pick a random encoding. utf8 nearly always works.

janeik · Sep-02-2023, 09:46 AM

(Sep-02-2023, 04:35 AM)deanhystad Wrote:
text = bytes([115, 195, 184, 114, 45, 116, 114, 195, 184, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))
Output:sÃ¸r-trÃ¸ndelag
sør-trøndelag
latin1 encoding is the wrong choice.

When I cut the string from the website and pasted in a text file. I got this for bytes:
[115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103]
When I run this:
text = bytes([115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))
Output:sør-trøndelag
Traceback (most recent call last):
  File "c:\...test.py", line 3, in <module>
    print(str(text.decode("utf8")))
              ^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 1: invalid start byte
This time the string was encoded using latin encoding.

It is frustrating, but don't blame python or customtkinter. A lot of the blame has to go to Windows which doesn't really know what to do with extended characters. I think there is some code that spins a wheel to pick a random encoding. utf8 nearly always works.

hi, the issue is gone. UTF-8 wasnt set in windows10. Search on google found description of setting the checkbox: "howto set utf-8 system in win10"
thanks Wall

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 562: ord	ctrldan	23	4,880	Apr-24-2023, 03:40 PM Last Post: ctrldan
	Decode string ?	JohnnyCoffee	1	831	Jan-11-2023, 12:29 AM Last Post: bowlofred
	Using locationtagger to extract locations found in a specific country/region	lord_of_cinder	1	1,286	Oct-04-2022, 12:46 AM Last Post: Larz60+
	PIL Image im.show() no show!	Pedroski55	2	978	Sep-12-2022, 10:19 PM Last Post: Pedroski55
	Trouble installing modules/libraries and getting Notepad++ to show cyrillic letters	Dragiev	6	2,271	Jul-24-2022, 12:55 PM Last Post: Dragiev
	UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in position 14: ordin	Armandito	6	2,741	Apr-29-2022, 12:36 PM Last Post: Armandito
	PIL Image im.show() no show!	Pedroski55	6	4,949	Feb-08-2022, 06:32 AM Last Post: Pedroski55
	Control Mouse and Keyboard Across the Country Without VNC on Target PC	Khuber79	5	3,016	Feb-21-2021, 02:42 AM Last Post: NullAdmin
	'NoneType' object has no attribute 'encode'	bhagyashree	6	8,881	Nov-05-2020, 03:50 PM Last Post: deanhystad
	how to encode and decode same value	absolut	2	2,361	Sep-08-2020, 09:46 AM Last Post: TomToad

encode/decode to show correct country letters in a CTk combobox

User Panel Messages

Announcements