When using os.path.exists to check a path that contains accented characters os.path.exists fails.
From the command line the file's existence can be verified using "ls".
import os
>>> os.path.exists("/var/altmusic/Snõõper/Super Snõõper/10 STRETCHING 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3")
False
>>> os.path.exists('/var/altmusic/Snõõper/Super Snõõper/10 STRETCHING 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3')
False
>>> os.path.exists('/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3')
False
>>> os.path.exists("/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3")
False
>>> os.path.exists(/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3)
File "<stdin>", line 1
os.path.exists(/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3)
^
SyntaxError: invalid syntax
>>>
I am running Python 3.10.12 on a Linux Mint 21 Vanessa.
Any help is greatly apprecaited.
Cheers!
Glenn
This works for me
from pathlib import Path
import os
# This sets a path to executing script
# if this is different then something like
# path = Path('/')
# file = 'path/to/my/file'
path = Path(__file__).parent
txt_file = 'Snõõper/Super/Super Snõõper/file.txt'
apath = f'{path}/{txt_file}'
print(os.path.exists(apath))
output
Output:
True
I think the problem is what comes after /var/altmusic/Snõõper/Super Snõõper/. What does this print?
>>> os.path.exists("/var/altmusic/Snõõper/Super Snõõper")
I would find it odd that there are two spaces between the track number (10) and the track name. Is "STRETCHING" part of the track name or is the track name 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3?
Only way I could get True was to use pathlib. If I just used hardcode path I got False
I have no trouble using os.path. For example, I have "Snõõper/file.txt" relative to the current working directory.
Output:
>>> import os
>>> os.path.exists("Snõõper/file.txt")
True
Using an absolute path.
Output:
>>> import os
>>> os.path.exists("C:/Users/djhys/Documents/python/musings/Snõõper/file.txt")
True
Use the "/" operator with pathlib.Path objects.
from pathlib import Path
print((Path(__file__).parent / 'Snõõper/file.txt').exists())
I gave this a shot too and it failed.
(Aug-26-2024, 09:12 PM)menator01 Wrote: [ -> ]This works for me
from pathlib import Path
import os
# This sets a path to executing script
# if this is different then something like
# path = Path('/')
# file = 'path/to/my/file'
path = Path(__file__).parent
txt_file = 'Snõõper/Super/Super Snõõper/file.txt'
apath = f'{path}/{txt_file}'
print(os.path.exists(apath))
output
Output:
True
(Aug-26-2024, 09:34 PM)deanhystad Wrote: [ -> ]I think the problem is what comes after /var/altmusic/Snõõper/Super Snõõper/. What does this print?
>>> os.path.exists("/var/altmusic/Snõõper/Super Snõõper")
This fails too:
os.path.exists("/var/altmusic/Snõõper/Super Snõõper")
Output:
False
I did the following and it seems that the problem only exists when there are special characters like "õ".
os.path.exists("/var/altmusic/Snõõper")
False
os.path.exists("/var/altmusic/")
True
Quote:/var/altmusic/Snõõper/Super Snõõper/10 STRETCHING 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3
I would find it odd that there are two spaces between the track number (10) and the track name. Is "STRETCHING" part of the track name or is the track name 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3?
While it is odd it is not unheard of. I am iterating over a flat-file database and extracting audio track information. The operators only see the UI and have no idea of properly formatted paths in the underlying OS.
I am thinking is has something to do with the character encoding. Output from "locale" on the machine is:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
The database file shows that it is encoded UTF-8.
Cheers!
I haven't looked too hard at pathlib yet as the only paths having problems have special characters. I will do so now though.
Cheers!
(Aug-26-2024, 09:49 PM)menator01 Wrote: [ -> ]Only way I could get True was to use pathlib. If I just used hardcode path I got False
pathlib is having the same problem as is subprocess.run.
I am starting to think it is a problem with the interaction between Python and the shell.
Cheers!
(Aug-27-2024, 01:29 PM)glenndrives Wrote: [ -> ]I haven't looked too hard at pathlib yet as the only paths having problems have special characters. I will do so now though.
Cheers!
(Aug-26-2024, 09:49 PM)menator01 Wrote: [ -> ]Only way I could get True was to use pathlib. If I just used hardcode path I got False
It could be a problem with wrong encoding of the file name itself.
I had often issues with Chinese songs which caused a
UnicodeEncodeError
.
for p in Path().rglob("*"):
print(p)
Error:
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 118: surrogates not allowed
To get a list of bad paths, then run in your home directory following code:
import time
from pathlib import Path
files_with_bad_encoding = []
for p in Path().rglob("*"):
try:
p.name.encode()
except UnicodeEncodeError:
files_with_bad_encoding.append(p)
print(f"Found {len(files_with_bad_encoding)} paths with wrong encoding")
time.sleep(2)
print("Listing affected files")
for path in files_with_bad_encoding:
print(path.parts)
You can't print the Path or the affected part of the path directly, because it raises the
UnicodeEncodeError
again. Instead, the code prints the parts of the path, which is a tuple. This shows its representation of parts and is the cause of not raising the Exception again.
This kind of error is the most annoying in the whole Python Eco system!
I don't understand, why this issue is not fixed.