Sep-03-2024, 04:43 PM
Looking at this further you have hit on something I was wondering about.
I used command completion in the shell to get the location of the file and took a look at the path returned with Hexedit. The OS is representing the characters as:
U+00F5 õ 0xc3 0xb5 LATIN SMALL LETTER O WITH TILDE
Python is representing the characters as:
U+006F o 0x6f LATIN SMALL LETTER O, U+0303 ̃ 0xcc 0x83 COMBINING TILDE
Right now I am trying to figure out how to either get Linux Mint to accept the Python encoding or to get Python to output what Linux Mint is expecting.
I am using the Bash shell. I get the same results with the Fish shell.
Cheers!
I used command completion in the shell to get the location of the file and took a look at the path returned with Hexedit. The OS is representing the characters as:
U+00F5 õ 0xc3 0xb5 LATIN SMALL LETTER O WITH TILDE
Python is representing the characters as:
U+006F o 0x6f LATIN SMALL LETTER O, U+0303 ̃ 0xcc 0x83 COMBINING TILDE
Right now I am trying to figure out how to either get Linux Mint to accept the Python encoding or to get Python to output what Linux Mint is expecting.
I am using the Bash shell. I get the same results with the Fish shell.
Cheers!
(Sep-01-2024, 12:59 PM)DeaD_EyE Wrote: It could be a problem with wrong encoding of the file name itself.
I had often issues with Chinese songs which caused aUnicodeEncodeError
.
for p in Path().rglob("*"): print(p)To get a list of bad paths, then run in your home directory following code:
Error:UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 118: surrogates not allowed
import time from pathlib import Path files_with_bad_encoding = [] for p in Path().rglob("*"): try: p.name.encode() except UnicodeEncodeError: files_with_bad_encoding.append(p) print(f"Found {len(files_with_bad_encoding)} paths with wrong encoding") time.sleep(2) print("Listing affected files") for path in files_with_bad_encoding: print(path.parts)You can't print the Path or the affected part of the path directly, because it raises theUnicodeEncodeError
again. Instead, the code prints the parts of the path, which is a tuple. This shows its representation of parts and is the cause of not raising the Exception again.
This kind of error is the most annoying in the whole Python Eco system!
I don't understand, why this issue is not fixed.