Python Forum - os.path.exists fails with accented characters

Pages: 1 2

When using os.path.exists to check a path that contains accented characters os.path.exists fails.

From the command line the file's existence can be verified using "ls".

import os

>>> os.path.exists("/var/altmusic/Snõõper/Super Snõõper/10  STRETCHING 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3")
False
>>> os.path.exists('/var/altmusic/Snõõper/Super Snõõper/10  STRETCHING 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3')
False
>>> os.path.exists('/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3')
False
>>> os.path.exists("/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3")
False
>>> os.path.exists(/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3)
  File "<stdin>", line 1
    os.path.exists(/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3)
                   ^
SyntaxError: invalid syntax
>>>

I am running Python 3.10.12 on a Linux Mint 21 Vanessa.

Any help is greatly apprecaited.

Cheers!
Glenn

This works for me

from pathlib import Path
import os

# This sets a path to executing script
# if this is different then something like
# path = Path('/')
# file = 'path/to/my/file'

path = Path(__file__).parent


txt_file = 'Snõõper/Super/Super Snõõper/file.txt'

apath = f'{path}/{txt_file}'

print(os.path.exists(apath))

output

Output:
True

I think the problem is what comes after /var/altmusic/Snõõper/Super Snõõper/. What does this print?
>>> os.path.exists("/var/altmusic/Snõõper/Super Snõõper")
I would find it odd that there are two spaces between the track number (10) and the track name. Is "STRETCHING" part of the track name or is the track name 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3?

Only way I could get True was to use pathlib. If I just used hardcode path I got False

I have no trouble using os.path. For example, I have "Snõõper/file.txt" relative to the current working directory.

Output:>>> import os
>>> os.path.exists("Snõõper/file.txt")
True

Using an absolute path.

Output:>>> import os
>>> os.path.exists("C:/Users/djhys/Documents/python/musings/Snõõper/file.txt")
True

Use the "/" operator with pathlib.Path objects.

from pathlib import Path

print((Path(__file__).parent / 'Snõõper/file.txt').exists())

I gave this a shot too and it failed.

(Aug-26-2024, 09:12 PM)menator01 Wrote: [ -> ]This works for me

from pathlib import Path
import os

# This sets a path to executing script
# if this is different then something like
# path = Path('/')
# file = 'path/to/my/file'

path = Path(__file__).parent


txt_file = 'Snõõper/Super/Super Snõõper/file.txt'

apath = f'{path}/{txt_file}'

print(os.path.exists(apath))

output

Output:
True

(Aug-26-2024, 09:34 PM)deanhystad Wrote: [ -> ]I think the problem is what comes after /var/altmusic/Snõõper/Super Snõõper/. What does this print?
>>> os.path.exists("/var/altmusic/Snõõper/Super Snõõper")

This fails too:

os.path.exists("/var/altmusic/Snõõper/Super Snõõper")

Output:
False

I did the following and it seems that the problem only exists when there are special characters like "õ".

os.path.exists("/var/altmusic/Snõõper")
False

os.path.exists("/var/altmusic/")
True

Quote:
/var/altmusic/Snõõper/Super Snõõper/10  STRETCHING 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3
I would find it odd that there are two spaces between the track number (10) and the track name. Is "STRETCHING" part of the track name or is the track name 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3?

While it is odd it is not unheard of. I am iterating over a flat-file database and extracting audio track information. The operators only see the UI and have no idea of properly formatted paths in the underlying OS.

I am thinking is has something to do with the character encoding. Output from "locale" on the machine is:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

The database file shows that it is encoded UTF-8.

Cheers!

I haven't looked too hard at pathlib yet as the only paths having problems have special characters. I will do so now though.

Cheers!

(Aug-26-2024, 09:49 PM)menator01 Wrote: [ -> ]Only way I could get True was to use pathlib. If I just used hardcode path I got False

pathlib is having the same problem as is subprocess.run.

I am starting to think it is a problem with the interaction between Python and the shell.

Cheers!

(Aug-27-2024, 01:29 PM)glenndrives Wrote: [ -> ]I haven't looked too hard at pathlib yet as the only paths having problems have special characters. I will do so now though.

Cheers!

(Aug-26-2024, 09:49 PM)menator01 Wrote: [ -> ]Only way I could get True was to use pathlib. If I just used hardcode path I got False

It could be a problem with wrong encoding of the file name itself.
I had often issues with Chinese songs which caused a UnicodeEncodeError.

for p in Path().rglob("*"):
    print(p)

Error:
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 118: surrogates not allowed

To get a list of bad paths, then run in your home directory following code:

import time
from pathlib import Path

files_with_bad_encoding = []

for p in Path().rglob("*"):
    try:
        p.name.encode()
    except UnicodeEncodeError:
        files_with_bad_encoding.append(p)


print(f"Found {len(files_with_bad_encoding)} paths with wrong encoding")
time.sleep(2)

print("Listing affected files")
for path in files_with_bad_encoding:
    print(path.parts)

You can't print the Path or the affected part of the path directly, because it raises the UnicodeEncodeError again. Instead, the code prints the parts of the path, which is a tuple. This shows its representation of parts and is the cause of not raising the Exception again.

This kind of error is the most annoying in the whole Python Eco system!
I don't understand, why this issue is not fixed.

Pages: 1 2