I have a Windows file server (Win Server 2003 Standard x64 Edition) that contains csv files.
On a Debian machine (9.13 Stretch) I mounted a Windows share pointing to abovementioned machine by adding this line in file /etc/fstab:
//192.168.254.10/DATA /mnt/fsdata cifs uid=postgres,username=<*user*>,password=<*pwd*>,iocharset=utf8,sec=ntlm 0 0
I am running a python script on the Debian machine to check if the csv files exist on the windows machine.
On the windows machine, when I “cut” or “drag” all the files away from folder, and run Path.exists() on /mnt/fsdata/IT/Servers/PostGres/CDR/, the result is always true. (=incorrect)
On the windows machine, when I delete all the files, and run Path.exists() on /mnt/fsdata/IT/Servers/PostGres/CDR/, the result is false. (=correct)
It seems that cut or drag is not recognized on Debian side, and python still “sees” the files as being there.
This is the code excerpt:
#!/usr/bin/env python3
import pathlib
src = "/mnt/fsdata/IT/Servers/PostGres/CDR/"
dest = src + "procd/"
srcpath = pathlib.Path(src)
for file in list(srcpath.glob('*.csv')):
destpath = pathlib.Path(dest + file.name)
# check if file already exists in /procd folder:
if destpath.exists():
# something happens...
What can be done to avoid this?
I searched a lot online but found no answers.
Thank you
NaN
pathlib will return the 'address' of where you want a file to be.
To see if file is there, use exists.
example (untested):
from pathlib import Path
homepath = Path('.')
datapath = homepath / 'data'
datapath.mkdir(exist_ok=True) # make data directory (only if it is not already there)
myfile = datapath / 'myfile.text'
# check if file exists:
if myfile.exists():
with myfile.open() as fp:
data = myfile.read()
else:
print("myfile.txt does not exist")
I suspect it has something to do with the way samba servers work, or samba clients. Some answers on the web, such as
this one indicate that it could be a cache problem. I'm not proficient in these samba issues, but you can perhaps either fine tune the configuration of the shared directory on the Windows side or find a way for the Linux client to force the server to reread the contents of the directory. This
link could be helpful too.
(Nov-26-2020, 10:47 PM)Gribouillis Wrote: [ -> ]I suspect it has something to do with the way samba servers work, or samba clients. Some answers on the web, such as this one indicate that it could be a cache problem. I'm not proficient in these samba issues, but you can perhaps either fine tune the configuration of the shared directory on the Windows side or find a way for the Linux client to force the server to reread the contents of the directory. This link could be helpful too.
I added
cache=none
in
/etc/fstab
and restarted the machine. Nothing changed.
My opinion is that the problem is on python side, because when I
ls
the folder it is empty. After that, running the py script still gives
destpath.exists()
as being true.
I added following lines to try to open files:
with destpath.open() as f:
print("File name: " + str(destpath))
it returns an error:
Error:
An error occurred: [Errno 2] No such file or directory: '/mnt/fsdata/IT/Servers/PostGres/CDR/procd/Trunks-2020-11-01.csv'
Could it be a bug in python? Who can I contact for that?
You could perhaps first perform a os.stat()
call on the file and print all the fields of the returned stat_result
object to see what it contains. A bug in Python is by far the least plausible explanation.
stats are as follows:
Output:
os.stat_result(st_mode=33261, st_ino=1407374883714381, st_dev=41, st_nlink=1, st_uid=118, st_gid=0, st_size=1621, st_atime=1606730312, st_mtime=1606730312, st_ctime=1606730608)
os.stat_result(st_mode=33261, st_ino=1125899907003726, st_dev=41, st_nlink=1, st_uid=118, st_gid=0, st_size=6249, st_atime=1606730312, st_mtime=1606730312, st_ctime=1606730608)
os.stat_result(st_mode=33261, st_ino=1125899907003737, st_dev=41, st_nlink=1, st_uid=118, st_gid=0, st_size=8594, st_atime=1606730312, st_mtime=1606730312, st_ctime=1606730608)
etc...
This kind of checks leads very often into problems. More than you think.
# check if file exists:
if myfile.exists(): # <--- file could exist during this moment
# <--- maybe the file is now deleted, has changed permission or something else.
with myfile.open() as fp: # <-- will definitely raise an Exception if there is a problem
data = myfile.read() # <-- could also raise an Exception
Don't ask for permission, ask for forgiveness:
from pathlib import Path
...
myfile = Path("C:")
# try other not working Paths
...
try:
with myfile.open() as fp:
data = fp.read()
except FileNotFoundError:
print("File not found")
except PermissionError:
print("I do not have the permission to access", myfile)
except UnicodeDecodeError:
print("Could not decode UTF-8. Binary file or wrong encoding?")
Output:
I do not have the permission to access C:
Try this with binary files and you'll get a
UnicodeDecodeError
.
Try this with your Samba-Share. It should raise
FileNotFoundError
, but maybe it's an
OSError
.
Just try it and observe which Exception you get.
What happens if you try os.listdir(...)
or list(os.scandir(...))
and if you mix calls to subprocess.check_output(['ls', ...])
or subprocess.check_output("ls ...", shell=True)
in between?
I tried your suggestion:
from pathlib import Path
src = "/mnt/fsdata/IT/Servers/PostGres/CDR/"
dest = src + "procd"
# define the path
currentDirectory = Path(src)
currentPattern = "*.csv"
destDirectory = Path(dest)
for currentFile in currentDirectory.glob(currentPattern):
destFile = destDirectory / currentFile.name
if destFile.exists():
try:
with destFile.open() as fp:
data = fp.read()
print(data)
except FileNotFoundError:
print("File not found")
except PermissionError:
print("I do not have the permission to access", myfile)
except UnicodeDecodeError:
print("Could not decode UTF-8. Binary file or wrong encoding?")
I get
File not found
.
I am the only user manipulating files in these folders. It is a test environment.
What am I doing wrong?

DeaD_EyE
OK, I tried your "Don't ask for permission, ask for forgiveness"-approach and rewrote my script:
#!/usr/bin/env python3.8
import pathlib
from pathlib import Path
src = "/mnt/fsdata/IT/Servers/PostGres/CDR/"
dest = src + "procd"
dup = src + "dup"
# define the paths:
currentDirectory = pathlib.Path(src)
destDirectory = pathlib.Path(dest)
dupDirectory = pathlib.Path(dup)
currentPattern = "*.csv"
for currentFile in currentDirectory.glob(currentPattern):
destFile = destDirectory / currentFile.name
try:
with destFile.open() as fp:
data = fp.read()
except IOError:
currentFile.rename(destFile)
print('File moved to procd.')
continue
# Move file to dup folder:
duppath = dupDirectory / currentFile.name
currentFile.rename(duppath)
print('File moved to dup.')
print("Finished !")
When I upload files to
/mnt/fsdata/IT/Servers/PostGres/CDR
everything works fine.
Files are moved to
/procd
.
Now, when I drag those files from
/procd
back to
/mnt/fsdata/IT/Servers/PostGres/CDR
, nothing happens. The files remain where they are. It is as if the drag and drop never happened! (note that they are not moved to
/dup
either...)
What I basicaly want to achieve here is that I import files to
/CDR
folder. Check if they are not already in
/procd
. (if already in
/procd
, move them to
/dup
) Import them into a database. Move them to
/procd
.
What's wrong?