Python Forum
os.path.exists fails with accented characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
os.path.exists fails with accented characters
#1
When using os.path.exists to check a path that contains accented characters os.path.exists fails.

From the command line the file's existence can be verified using "ls".

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import os
 
>>> os.path.exists("/var/altmusic/Snõõper/Super Snõõper/10  STRETCHING 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3")
False
>>> os.path.exists('/var/altmusic/Snõõper/Super Snõõper/10  STRETCHING 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3')
False
>>> os.path.exists('/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3')
False
>>> os.path.exists("/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3")
False
>>> os.path.exists(/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3)
  File "<stdin>", line 1
    os.path.exists(/var/altmusic/Snõõper/Super\ Snõõper/10\ \ STRETCHING\ 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3)
                   ^
SyntaxError: invalid syntax
>>>
I am running Python 3.10.12 on a Linux Mint 21 Vanessa.

Any help is greatly apprecaited.

Cheers!
Glenn
Reply
#2
This works for me

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from pathlib import Path
import os
 
# This sets a path to executing script
# if this is different then something like
# path = Path('/')
# file = 'path/to/my/file'
 
path = Path(__file__).parent
 
 
txt_file = 'Snõõper/Super/Super Snõõper/file.txt'
 
apath = f'{path}/{txt_file}'
 
print(os.path.exists(apath))
output
Output:
True
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags
Download my project scripts


Reply
#3
I think the problem is what comes after /var/altmusic/Snõõper/Super Snõõper/. What does this print?
>>> os.path.exists("/var/altmusic/Snõõper/Super Snõõper")
I would find it odd that there are two spaces between the track number (10) and the track name. Is "STRETCHING" part of the track name or is the track name 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3?
Reply
#4
Only way I could get True was to use pathlib. If I just used hardcode path I got False
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags
Download my project scripts


Reply
#5
I have no trouble using os.path. For example, I have "Snõõper/file.txt" relative to the current working directory.
Output:
>>> import os >>> os.path.exists("Snõõper/file.txt") True
Using an absolute path.
Output:
>>> import os >>> os.path.exists("C:/Users/djhys/Documents/python/musings/Snõõper/file.txt") True
Use the "/" operator with pathlib.Path objects.
1
2
3
from pathlib import Path
 
print((Path(__file__).parent / 'Snõõper/file.txt').exists())
Reply
#6
I gave this a shot too and it failed.

(Aug-26-2024, 09:12 PM)menator01 Wrote: This works for me

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from pathlib import Path
import os
 
# This sets a path to executing script
# if this is different then something like
# path = Path('/')
# file = 'path/to/my/file'
 
path = Path(__file__).parent
 
 
txt_file = 'Snõõper/Super/Super Snõõper/file.txt'
 
apath = f'{path}/{txt_file}'
 
print(os.path.exists(apath))
output
Output:
True
Reply
#7
(Aug-26-2024, 09:34 PM)deanhystad Wrote: I think the problem is what comes after /var/altmusic/Snõõper/Super Snõõper/. What does this print?
>>> os.path.exists("/var/altmusic/Snõõper/Super Snõõper")

This fails too:
1
os.path.exists("/var/altmusic/Snõõper/Super Snõõper")
Output:
False
I did the following and it seems that the problem only exists when there are special characters like "õ".
1
2
3
4
5
os.path.exists("/var/altmusic/Snõõper")
False
 
os.path.exists("/var/altmusic/")
True
Quote:
1
/var/altmusic/Snõõper/Super Snõõper/10  STRETCHING 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3
I would find it odd that there are two spaces between the track number (10) and the track name. Is "STRETCHING" part of the track name or is the track name 2_TRAILS_SPL_TMM_M1_D_16_44.1.mp3?
While it is odd it is not unheard of. I am iterating over a flat-file database and extracting audio track information. The operators only see the UI and have no idea of properly formatted paths in the underlying OS.

I am thinking is has something to do with the character encoding. Output from "locale" on the machine is:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

The database file shows that it is encoded UTF-8.

Cheers!
Reply
#8
I haven't looked too hard at pathlib yet as the only paths having problems have special characters. I will do so now though.

Cheers!

(Aug-26-2024, 09:49 PM)menator01 Wrote: Only way I could get True was to use pathlib. If I just used hardcode path I got False
Reply
#9
pathlib is having the same problem as is subprocess.run.

I am starting to think it is a problem with the interaction between Python and the shell.

Cheers!

(Aug-27-2024, 01:29 PM)glenndrives Wrote: I haven't looked too hard at pathlib yet as the only paths having problems have special characters. I will do so now though.

Cheers!

(Aug-26-2024, 09:49 PM)menator01 Wrote: Only way I could get True was to use pathlib. If I just used hardcode path I got False
Reply
#10
It could be a problem with wrong encoding of the file name itself.
I had often issues with Chinese songs which caused a UnicodeEncodeError.

1
2
for p in Path().rglob("*"):
    print(p)
Error:
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 118: surrogates not allowed
To get a list of bad paths, then run in your home directory following code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import time
from pathlib import Path
 
files_with_bad_encoding = []
 
for p in Path().rglob("*"):
    try:
        p.name.encode()
    except UnicodeEncodeError:
        files_with_bad_encoding.append(p)
 
 
print(f"Found {len(files_with_bad_encoding)} paths with wrong encoding")
time.sleep(2)
 
print("Listing affected files")
for path in files_with_bad_encoding:
    print(path.parts)
You can't print the Path or the affected part of the path directly, because it raises the UnicodeEncodeError again. Instead, the code prints the parts of the path, which is a tuple. This shows its representation of parts and is the cause of not raising the Exception again.

This kind of error is the most annoying in the whole Python Eco system!
I don't understand, why this issue is not fixed.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  WebDriverException: Message: 'PATH TO CHROME DRIVER' executable needs to be in PATH Led_Zeppelin 1 3,234 Sep-09-2021, 01:25 PM
Last Post: Yoriz
  path.exists returning True when it shouldn't natha18 0 2,691 Sep-21-2020, 01:04 PM
Last Post: natha18
  p]Why os.path.exists("abc/d") and os.path.exists("abc/D") treat same rajeev1729 1 2,774 May-27-2020, 08:34 AM
Last Post: DeaD_EyE
  Remove escape characters / Unicode characters from string DreamingInsanity 5 22,084 May-15-2020, 01:37 PM
Last Post: snippsat
  path.exists() problem CAHinton 2 3,713 Jul-24-2018, 05:47 PM
Last Post: CAHinton
  .pth file does not show up in sys.path when configuring path. arjunsingh2908 2 7,674 Jul-03-2018, 11:16 AM
Last Post: arjunsingh2908
  os.path.exists apparently doesn't always work! Larz60+ 2 5,742 Oct-10-2017, 10:16 PM
Last Post: sparkz_alot

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020