Python Forum

Full Version: pathlib destpath.exists() true even file does not exist
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a Windows file server (Win Server 2003 Standard x64 Edition) that contains csv files.

On a Debian machine (9.13 Stretch) I mounted a Windows share pointing to abovementioned machine by adding this line in file /etc/fstab:
//192.168.254.10/DATA /mnt/fsdata cifs uid=postgres,username=<*user*>,password=<*pwd*>,iocharset=utf8,sec=ntlm 0 0

I am running a python script on the Debian machine to check if the csv files exist on the windows machine.

On the windows machine, when I “cut” or “drag” all the files away from folder, and run Path.exists() on /mnt/fsdata/IT/Servers/PostGres/CDR/, the result is always true. (=incorrect)

On the windows machine, when I delete all the files, and run Path.exists() on /mnt/fsdata/IT/Servers/PostGres/CDR/, the result is false. (=correct)

It seems that cut or drag is not recognized on Debian side, and python still “sees” the files as being there.

This is the code excerpt:

#!/usr/bin/env python3

import pathlib

src = "/mnt/fsdata/IT/Servers/PostGres/CDR/"
dest = src + "procd/"

srcpath = pathlib.Path(src)

for file in list(srcpath.glob('*.csv')):
                  destpath = pathlib.Path(dest + file.name)
                  
                  # check if file already exists in /procd folder:
                  if destpath.exists():
                        # something happens...
What can be done to avoid this?

I searched a lot online but found no answers.

Thank you
NaN
pathlib will return the 'address' of where you want a file to be.
To see if file is there, use exists.

example (untested):
from pathlib import Path

homepath = Path('.')
datapath = homepath / 'data'
datapath.mkdir(exist_ok=True)    # make data directory (only if it is not already there)
myfile = datapath / 'myfile.text'

# check if file exists:
if myfile.exists():
    with myfile.open() as fp:
        data = myfile.read()
else:
    print("myfile.txt does not exist")
I suspect it has something to do with the way samba servers work, or samba clients. Some answers on the web, such as this one indicate that it could be a cache problem. I'm not proficient in these samba issues, but you can perhaps either fine tune the configuration of the shared directory on the Windows side or find a way for the Linux client to force the server to reread the contents of the directory. This link could be helpful too.
(Nov-26-2020, 10:47 PM)Gribouillis Wrote: [ -> ]I suspect it has something to do with the way samba servers work, or samba clients. Some answers on the web, such as this one indicate that it could be a cache problem. I'm not proficient in these samba issues, but you can perhaps either fine tune the configuration of the shared directory on the Windows side or find a way for the Linux client to force the server to reread the contents of the directory. This link could be helpful too.

I added cache=none in /etc/fstab and restarted the machine. Nothing changed.
My opinion is that the problem is on python side, because when I ls the folder it is empty. After that, running the py script still gives destpath.exists() as being true.

I added following lines to try to open files:
with destpath.open() as f:
    print("File name: " + str(destpath))
it returns an error:
Error:
An error occurred: [Errno 2] No such file or directory: '/mnt/fsdata/IT/Servers/PostGres/CDR/procd/Trunks-2020-11-01.csv'
Could it be a bug in python? Who can I contact for that?
You could perhaps first perform a os.stat() call on the file and print all the fields of the returned stat_result object to see what it contains. A bug in Python is by far the least plausible explanation.
stats are as follows:
Output:
os.stat_result(st_mode=33261, st_ino=1407374883714381, st_dev=41, st_nlink=1, st_uid=118, st_gid=0, st_size=1621, st_atime=1606730312, st_mtime=1606730312, st_ctime=1606730608) os.stat_result(st_mode=33261, st_ino=1125899907003726, st_dev=41, st_nlink=1, st_uid=118, st_gid=0, st_size=6249, st_atime=1606730312, st_mtime=1606730312, st_ctime=1606730608) os.stat_result(st_mode=33261, st_ino=1125899907003737, st_dev=41, st_nlink=1, st_uid=118, st_gid=0, st_size=8594, st_atime=1606730312, st_mtime=1606730312, st_ctime=1606730608)
etc...
This kind of checks leads very often into problems. More than you think.

# check if file exists:
if myfile.exists():   # <--- file could exist during this moment
    # <--- maybe the file is now deleted, has changed permission or something else.
    with myfile.open() as fp: # <-- will definitely raise an Exception if there is a problem
        data = myfile.read() # <-- could also raise an Exception
Don't ask for permission, ask for forgiveness:
from pathlib import Path


...
myfile = Path("C:")
# try other not working Paths
...


try:
    with myfile.open() as fp:
        data = fp.read()
except FileNotFoundError:
    print("File not found")
except PermissionError:
    print("I do not have the permission to access", myfile)
except UnicodeDecodeError:
    print("Could not decode UTF-8. Binary file or wrong encoding?")
Output:
I do not have the permission to access C:
Try this with binary files and you'll get a UnicodeDecodeError.
Try this with your Samba-Share. It should raise FileNotFoundError, but maybe it's an OSError.
Just try it and observe which Exception you get.
What happens if you try os.listdir(...) or list(os.scandir(...)) and if you mix calls to subprocess.check_output(['ls', ...]) or subprocess.check_output("ls ...", shell=True) in between?
I tried your suggestion:

from pathlib import Path

src = "/mnt/fsdata/IT/Servers/PostGres/CDR/"
dest = src + "procd"

# define the path
currentDirectory = Path(src)
currentPattern = "*.csv"
destDirectory = Path(dest)

for currentFile in currentDirectory.glob(currentPattern):
    destFile = destDirectory / currentFile.name  
    if destFile.exists():
        try:
          with destFile.open() as fp:
            data = fp.read()
            print(data)
        except FileNotFoundError:
            print("File not found")
        except PermissionError:
            print("I do not have the permission to access", myfile)
        except UnicodeDecodeError:
            print("Could not decode UTF-8. Binary file or wrong encoding?")
I get File not found.
I am the only user manipulating files in these folders. It is a test environment.
What am I doing wrong? Confused
DeaD_EyE

OK, I tried your "Don't ask for permission, ask for forgiveness"-approach and rewrote my script:
#!/usr/bin/env python3.8

import pathlib
from pathlib import Path

src = "/mnt/fsdata/IT/Servers/PostGres/CDR/"
dest = src + "procd"
dup = src + "dup" 

# define the paths:
currentDirectory = pathlib.Path(src)
destDirectory = pathlib.Path(dest)
dupDirectory = pathlib.Path(dup) 
currentPattern = "*.csv"

for currentFile in currentDirectory.glob(currentPattern):
    destFile = destDirectory / currentFile.name  
    try:
        with destFile.open() as fp:
          data = fp.read()
    except IOError:
        currentFile.rename(destFile)
        print('File moved to procd.')
        continue    
    # Move file to dup folder:
    duppath = dupDirectory / currentFile.name
    currentFile.rename(duppath)
    print('File moved to dup.')
print("Finished !")
When I upload files to /mnt/fsdata/IT/Servers/PostGres/CDR everything works fine.
Files are moved to /procd.
Now, when I drag those files from /procd back to /mnt/fsdata/IT/Servers/PostGres/CDR, nothing happens. The files remain where they are. It is as if the drag and drop never happened! (note that they are not moved to /dup either...)
What I basicaly want to achieve here is that I import files to /CDR folder. Check if they are not already in /procd. (if already in /procd, move them to /dup) Import them into a database. Move them to /procd.
What's wrong?