Python Forum
os.path.exists fails with accented characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
os.path.exists fails with accented characters
#11
Looking at this further you have hit on something I was wondering about.

I used command completion in the shell to get the location of the file and took a look at the path returned with Hexedit. The OS is representing the characters as:

U+00F5 õ 0xc3 0xb5 LATIN SMALL LETTER O WITH TILDE

Python is representing the characters as:

U+006F o 0x6f LATIN SMALL LETTER O, U+0303 ̃ 0xcc 0x83 COMBINING TILDE

Right now I am trying to figure out how to either get Linux Mint to accept the Python encoding or to get Python to output what Linux Mint is expecting.

I am using the Bash shell. I get the same results with the Fish shell.

Cheers!

(Sep-01-2024, 12:59 PM)DeaD_EyE Wrote: It could be a problem with wrong encoding of the file name itself.
I had often issues with Chinese songs which caused a UnicodeEncodeError.

for p in Path().rglob("*"):
    print(p)
Error:
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 118: surrogates not allowed
To get a list of bad paths, then run in your home directory following code:
import time
from pathlib import Path

files_with_bad_encoding = []

for p in Path().rglob("*"):
    try:
        p.name.encode()
    except UnicodeEncodeError:
        files_with_bad_encoding.append(p)


print(f"Found {len(files_with_bad_encoding)} paths with wrong encoding")
time.sleep(2)

print("Listing affected files")
for path in files_with_bad_encoding:
    print(path.parts)
You can't print the Path or the affected part of the path directly, because it raises the UnicodeEncodeError again. Instead, the code prints the parts of the path, which is a tuple. This shows its representation of parts and is the cause of not raising the Exception again.

This kind of error is the most annoying in the whole Python Eco system!
I don't understand, why this issue is not fixed.
Reply
#12
It looks like I have solved my problem.

Using the
unicodedata
library with
unicodedata.normalize
I was able to get Python3 to output the Normal Form C version of the UTF-8 characters that the underlying OS was wanting.

From https://docs.python.org/3/library/unicod....normalize:

Quote:The Unicode standard defines various normalization forms of a Unicode string, based on the definition of canonical equivalence and compatibility equivalence. In Unicode, several characters can be expressed in various way. For example, the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA).

For each character, there are two normal forms: normal form C and normal form D. Normal form D (NFD) is also known as canonical decomposition, and translates each character into its decomposed form. Normal form C (NFC) first applies a canonical decomposition, then composes pre-combined characters again.

Changing line
commandLineOptions["path"] = finalPath
to
commandLineOptions["path"] = unicodedata.normalize('NFC', finalPath)
resulted in
Output:
(Path Exists): /var/altmusic/Snõõper/Super Snõõper/12 TownTopic_TMM_M1_D_16_44.1.mp3
Thank you all for your help.

Cheers!
Glenn
Gribouillis likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  WebDriverException: Message: 'PATH TO CHROME DRIVER' executable needs to be in PATH Led_Zeppelin 1 3,111 Sep-09-2021, 01:25 PM
Last Post: Yoriz
  path.exists returning True when it shouldn't natha18 0 2,491 Sep-21-2020, 01:04 PM
Last Post: natha18
  p]Why os.path.exists("abc/d") and os.path.exists("abc/D") treat same rajeev1729 1 2,723 May-27-2020, 08:34 AM
Last Post: DeaD_EyE
  Remove escape characters / Unicode characters from string DreamingInsanity 5 20,865 May-15-2020, 01:37 PM
Last Post: snippsat
  path.exists() problem CAHinton 2 3,648 Jul-24-2018, 05:47 PM
Last Post: CAHinton
  .pth file does not show up in sys.path when configuring path. arjunsingh2908 2 7,436 Jul-03-2018, 11:16 AM
Last Post: arjunsingh2908
  os.path.exists apparently doesn't always work! Larz60+ 2 5,640 Oct-10-2017, 10:16 PM
Last Post: sparkz_alot

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020