Python Forum
How to implement APScheduler in Python 3.6? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: How to implement APScheduler in Python 3.6? (/thread-10755.html)



How to implement APScheduler in Python 3.6? - PrateekG - Jun-05-2018

Hi All,

I have written a python script (myfile.py) which scrapes the product related data from an e-commerce site and store in mysql db.

Now I want to schedule this script to refresh once a week.

I have installed the APScheduler for this work but need your help to implement this-
https://apscheduler.readthedocs.io/en/latest/userguide.html

Can anyone please share his knowledge?


RE: How to implement APScheduler in Python 3.6? - buran - Jun-05-2018

you should know the drill already - what have you tried, post code and ask questions, etc...


RE: How to implement APScheduler in Python 3.6? - DeaD_EyE - Jun-05-2018

Read first the provided examples: https://github.com/agronholm/apscheduler/tree/master/examples/schedulers

https://github.com/agronholm/apscheduler/blob/master/examples/schedulers/background.py


RE: How to implement APScheduler in Python 3.6? - PrateekG - Jun-05-2018

Yes, I have seen the examples.
But I am not sure where to use my python script (myfile.py) in a scheduler.


RE: How to implement APScheduler in Python 3.6? - PrateekG - Jun-05-2018

following is the content of my python script-
def get_soup(url):
soup = None
try:
    response = requests.get(url)
    if response.status_code == 200:
        html = response.content
        soup = BeautifulSoup(html, "html.parser")
    return soup

def get_category_urls(url):
soup = get_soup(url)
cat_urls = []
try:
    categories = soup.find('div', attrs={'id': 'menu_oc'})
    if categories is not None:
        for c in categories.findAll('a'):
            if c['href'] is not None:
                cat_urls.append(c['href'])
    return cat_urls

def get_product_urls(url):
 soup = get_soup(url)
 prod_urls = []
 if soup.find('div', attrs={'class': 'pagination'}):
     for link in soup.select('div.links a'):
         if link.string.isdecimal():  # dump next and last links
             prod_urls.append(link['href'])
 print("Found following product urls::", prod_urls)
 return prod_urls

if __name__ == '__main__':
category_urls = get_category_urls(URL)
product_urls = get_product_urls(URL)
#TODO upload in db
Now I have created a scheduler-refresh.py with following content
import schedule
import time
def job():
    //how to call myfile.py here?
    print("refreshing...")
schedule.every().week.at("10:30").do(job)
while 1:
    schedule.run_pending()
    time.sleep(1)
Here I don;t know how to call myfile.py. Can you help me?