Python Forum

hi.
I have a text file encoded utf-8
Reading it with i.e. notepad shows "sør-trøndelag"
Note the Norwegian character "ø"

Binary reading in Python, from txtfile utf-8 encoded, shows:
"115 195 184 114 45 116 114 195 184 110 100 101 108 97 103"

A CtkComboBox receives the content of txtfile above.
Its output (list content viewed) is:
"sÃ¸r-trÃ¸ndelag"
Decoding binary file using utf-8 shows correct letter ø in combobox:
sør-trøndelag

Decoding using ansi and latin-1, both shows:
sÃ¸r-trÃ¸ndelag

So,the utf-8 should be used here(?)

Which side of the combobox needs coding to get the proper strings shown in its list?

Attached is an image if text above mess up correct viewing.

Thank You in advance.

I tried to save the sÃ¸r-trÃ¸ndelag into a new ansi coded textfile and it shows the correct/expected result sør-trøndelag.
Should I encode the list sent to combobox into i.e ansi (donno what coding Python uses?)
edit 2: tried to input sÃ¸r-trÃ¸ndelag into combobox. It was shown in combobox without change.

[attachment=2518]

text = bytes([115, 195, 184, 114, 45, 116, 114, 195, 184, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))

Output:sÃ¸r-trÃ¸ndelag
sør-trøndelag

latin1 encoding is the wrong choice.

When I cut the string from the website and pasted in a text file. I got this for bytes:

[115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103]

When I run this:

text = bytes([115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))

Output:sør-trøndelag
Traceback (most recent call last):
  File "c:\...test.py", line 3, in <module>
    print(str(text.decode("utf8")))
              ^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 1: invalid start byte

This time the string was encoded using latin encoding.

It is frustrating, but don't blame python or customtkinter. A lot of the blame has to go to Windows which doesn't really know what to do with extended characters. I think there is some code that spins a wheel to pick a random encoding. utf8 nearly always works.

(Sep-02-2023, 04:35 AM)deanhystad Wrote: [ -> ]
text = bytes([115, 195, 184, 114, 45, 116, 114, 195, 184, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))
Output:sÃ¸r-trÃ¸ndelag
sør-trøndelag
latin1 encoding is the wrong choice.

When I cut the string from the website and pasted in a text file. I got this for bytes:
[115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103]
When I run this:
text = bytes([115, 248, 114, 45, 116, 114, 248, 110, 100, 101, 108, 97, 103])
print(str(text.decode("latin1")))
print(str(text.decode("utf8")))
Output:sør-trøndelag
Traceback (most recent call last):
  File "c:\...test.py", line 3, in <module>
    print(str(text.decode("utf8")))
              ^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 1: invalid start byte
This time the string was encoded using latin encoding.

It is frustrating, but don't blame python or customtkinter. A lot of the blame has to go to Windows which doesn't really know what to do with extended characters. I think there is some code that spins a wheel to pick a random encoding. utf8 nearly always works.

hi, the issue is gone. UTF-8 wasnt set in windows10. Search on google found description of setting the checkbox: "howto set utf-8 system in win10"
thanks Wall

janeik

deanhystad

janeik