Posts: 8
Threads: 1
Joined: Jun 2019
Jun-28-2019, 09:49 PM
(This post was last modified: Jun-28-2019, 09:50 PM by redwood.)
Hello Members,
I am working on an automation wherein I need to monitor a location on cloud and if there is a new file my Python program is suppose to do some activity on it and save the latest timestamp in a variable.
I have written a code which is doing its job perfectly. Since I have used a 'while' loop so it is supposed to run in infinite loop. But there has some unexpected scenario occurred and now I need to execute my program through a cron job. So, obviously I cannot use the 'while' loop anymore. Instead I need to save the latest timestamp in a file and re-reread every-time the cron job is executed. Below is my code snippet
import glob
import logging
import os
import subprocess
import time
logger = logging.getLogger("")
logger.setLevel(logging.INFO)
formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(name)s:%(message)s")
file_handler = logging.FileHandler("/var/log/ab.log")
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
comp_timestamp = None
while True:
source_files = list(glob.iglob("<path>/*.deb"))
if source_files:
latest_file = max(source_files, key=os.path.getmtime)
latest_timestamp = os.path.getmtime(latest_file)
if (comp_timestamp is None) or (latest_timestamp > comp_timestamp):
try:
logging.info("copying latest file '{}'..".format(latest_file))
subprocess.Popen(["cp", "-r", latest_file, "/tmp"])
logging.info("waiting for 5 seconds...")
time.sleep(5)
comp_timestamp = latest_timestamp
os.chdir("/tmp")
deb_files = list(glob.iglob("/tmp/*.deb"))
if deb_files:
local_file = max(deb_files, key=os.path.getmtime)
local_timestamp = os.path.getmtime(local_file)
logging.info("depackaging local debian file '{}'..".format(local_file))
subproc = subprocess.Popen(["dpkg", "--install", local_file])
logging.info(subproc.__dict__)
time.sleep(5)
else:
logging.error("Latest file not found")
except AttributeError as att:
logging.exception(latest_file)
else:
logging.info("No newer files found.")
else:
logging.warning("No files found at all.")
time.sleep(1) I can use the python file handling as shown below to write and then read the file. Please correct me if I am wrong:
with open('test.txt', 'w') as tp:
tp.write('my name is khan')
with open('test.txt') as re:
data = re.read()
But I am not sure how to,or where to use them so that during every cron cycle the file is read to compare the timestamp of the latest file and the timestamp stored in the file.
Please let me know if my explanation is not clear.
Thank you,
Posts: 479
Threads: 86
Joined: Feb 2018
Jun-29-2019, 05:34 AM
(This post was last modified: Jun-29-2019, 05:34 AM by SheeppOSU.)
I'm not familiar with those imports so i can't tell you where to put the save and load but here is an example. For this example I have already placed a number in the document.
import random
with open('data.txt', 'r') as file:
data = file.read()
count = 0
while True:
guess = int(input('Guess a integer between 0 and 11\n'))
count += 1
if guess == data:
break
print('You guessed it in %s tries' %count)
with open('data.tx', 'w') as file:
file.write(random.randint(1, 10))
Posts: 7,312
Threads: 123
Joined: Sep 2016
Jun-29-2019, 06:27 PM
(This post was last modified: Jun-29-2019, 06:27 PM by snippsat.)
I started to test a little,so this is what came up with.
It's also better to use functions to get some structure.
import os
import platform
import logzero
from logzero import logger
# 3 rotations, each with a maximum filesize of 1MB:
path = '/home/tom/Documents/stamp/'
logzero.logfile(f"{path}logfile.log",
maxBytes=1e6, backupCount=3, disableStderrLogger=True)
def newest_file(path='.'):
files = os.listdir(path)
paths = [os.path.join(path, basename) for basename in files]
if platform.system() == 'Windows':
return max(paths, key=os.path.getctime)
else:
return max(paths, key=os.path.getmtime)
def stamp(newest_file):
file_stamp = os.path.getmtime(newest_file)
return file_stamp, newest_file
def file_compare(file_stamp, file_name):
try:
with open(f'{path}stamp.txt') as f:
old_stamp = float(f.read())
if old_stamp == file_stamp:
print(f'No change: {file_name} --> {file_stamp}')
else:
print(f'New file: {file_name} --> {file_stamp}')
logger.info(f'{file_name} --> {file_stamp}')
with open(f'{path}stamp.txt', 'w') as f:
f.write(str(file_stamp))
except OSError:
with open(f'{path}stamp.txt', 'w') as f:
f.write(str(file_stamp))
if __name__ == '__main__':
newest_file = newest_file()
file_stamp = stamp(newest_file)[0]
file_name = os.path.basename(stamp(newest_file)[1])
#print(file_name)
#print(file_stamp)
#--- Start
file_compare(file_stamp, file_name) So if a new file is created in folder,it write a timestamp and a log file to a different folder <stamp>.
On first run it just write the timestamp file,so have something to compare aginst.
The log file:
Output: [I 190629 20:06:01 lastet_file:27] car.txt --> 1561831549.9515111
[I 190629 20:06:26 lastet_file:27] test.py --> 1561831577.1605363
[I 190629 20:08:25 lastet_file:27] train.rar --> 1561831679.412104
If no new file,then a test print.
Output: No change: train.rar --> 1561831679.412104
I have tested it on Windows and Linux,it should be easy to modify so it dos stuff with subprocess as you do.
For logging logzero is good,as all boiler code is done for you.
Posts: 8
Threads: 1
Joined: Jun 2019
Let me do some testing with this and then I can put up my results....Thank you for taking this up..
Posts: 8
Threads: 1
Joined: Jun 2019
Hello Snippsat,
In my working ennvironment I cannot use any module which is not installed from before. Or it has to go through many why's from my management. So, I will have to use "logging" module only. Another thing is that, when I was testing the code I have below two observation.
1) When I use your program as it is in another environment where I could use "logzero" I see the output as your's. However, when I commented it out and executed in the environment where logzero is not allowed. Now I do not see any output on my screen. Could you please let me know what did I do wrong?
Below are the lines that I commented out before executing in my prod environment.
import logzero
from logzero import logger
logzero.logfile(f"{path}logfile.log", maxBytes=1e6, backupCount=3, disableStderrLogger=True) 2) The "stamp.txt" is getting created at the location which I am monitoring. As a standard the "stamp.txt" needs to be created on the machine where I am running the code from.
Sorry if my questions are dumb, I am quite new to coding and still learning to apply logic especially when defining functions.
Posts: 8
Threads: 1
Joined: Jun 2019
Hello Snippsat,
1) I figured out on how to bring the stamp.txt to my local machine.
2) I could now see the output on my CLI properly.
One very strange behavior I have noticed though. The code is listing the files only from the location where the code resides( say location a). However, when I changed the "path" to point to some other location( say location b). Its still listing ( say location a). So, I tried to use as below
path = glob.iglob('/tmp/*.txt')
def newest_file(path):
#files = os.listdir(path)
#paths = [os.path.join(path, basename) for basename in files]
if platform.system() == 'Windows':
return max(path, key=os.path.getctime)
else:
return max(path, key=os.path.getmtime) Now, I get below error:
def newest_file(path='.'):
#files = os.listdir(path)
#paths = [os.path.join(path, basename) for basename in files]
if platform.system() == 'Windows':
return max(path, key=os.path.getctime)
else:
return max(path, key=os.path.getmtime)
Please help me here as well.
Posts: 3,458
Threads: 101
Joined: Sep 2016
(Jul-01-2019, 03:56 PM)redwood Wrote: Now, I get below error: So what's the error you get?
Posts: 8
Threads: 1
Joined: Jun 2019
Jul-01-2019, 04:12 PM
(This post was last modified: Jul-01-2019, 04:28 PM by redwood.)
Please ignore my last two posts, I was able to fix them on my own..
Okay, So after fixing the issues, I tested it. For the first time it creates .stamp.txt, and the code seem to recognize that file as new file. Now I executed the program thrice, and the code is still recognizing the .stamp.txt as new file every time and I could see the timestamp changes each time in stamp.txt
Posts: 7,312
Threads: 123
Joined: Sep 2016
Did look more this so i did write an improved version,battle tested in Windows and Linux.
Now do a recursive scan in folder with pathlib, os.walk in combination with glob.
So now can choice to watch all files or only a file extension eg deb, txt, jpg, ect...
The setup has been made easy.
# watch_folder.py
import os, sys
import platform
import logzero
from logzero import logger
from pathlib import Path, PurePath
def newest_file(folder_to_watch, file_extension):
if file_extension == '*':
fn = []
for root, dirs, files in os.walk(folder_to_watch):
for file in files:
fn.append(os.path.join(root, file))
try:
return max(fn, key=os.path.getctime)
except ValueError:
print('No file in folder')
sys.exit()
else:
paths = []
for filename in Path(folder_to_watch).glob(f'**/*.{file_extension}'):
paths.append(filename)
try:
return max(paths, key=os.path.getctime)
except ValueError:
print(f'No file with extension <{file_extension}>')
sys.exit()
def stamp(newest_file):
file_stamp = os.path.getmtime(newest_file)
return file_stamp, newest_file
def file_compare(file_stamp, file_name, log_folder):
# 3 rotations, each with a maximum filesize of 1MB
logzero.logfile(f"{log_folder}logfile.log",
maxBytes=1e6, backupCount=3, disableStderrLogger=True)
try:
with open(f'{log_folder}stamp.txt') as f:
old_stamp = float(f.read())
if old_stamp == file_stamp:
print(f'No change: {file_name} --> {file_stamp}')
else:
print(f'New file: {repr(file_name)} --> {file_stamp}')
logger.info(f'{file_name} --> {file_stamp}')
with open(f'{log_folder}stamp.txt', 'w') as f:
f.write(str(file_stamp))
except OSError:
with open(f'{log_folder}stamp.txt', 'w') as f:
f.write(str(file_stamp))
if __name__ == '__main__':
#--- Setup ---#
folder_to_watch = '/home/tom/Documents/test_folder/'
# Watch a spesific file extensions,eg txt, jpg, zip, ect...
# All files default *
file_extension = '*'
# Folder to save filestamp and log file
log_folder = '/home/tom/Documents/stamp/'
#---#
newest_file = newest_file(folder_to_watch, file_extension)
file_stamp = stamp(newest_file)[0]
file_name = os.path.basename(stamp(newest_file)[1])
#--- Start ---#
file_compare(file_stamp, file_name, log_folder) For and all Python schedule solution can use schedule.
import subprocess
import schedule
import time
def watch_folder():
subprocess.run(['python', 'watch_folder.py'])
if __name__ == '__main__':
schedule.every(.1).minutes.do(watch_folder)
while True:
schedule.run_pending()
time.sleep(1) So can call it a very lite version of watchdog.
Watchdog handles many filesystem events and can also execute shell command.
Posts: 8
Threads: 1
Joined: Jun 2019
Jul-03-2019, 02:24 PM
(This post was last modified: Jul-03-2019, 02:24 PM by redwood.)
I am currently testing the new program shared by you, and so far it seems to give the right results. I just have one question for now to complete my testing as a whole
As you know that I need to run the subprocess.Popen() on the latest file. So I assume that I would have to call it under
"def file_compare" function within the "else" block as it the part that is working on the latest file. Or do I need to use that "schedule" module?
Thank you,
|