Python Forum
No Internet connection when running a Python script
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
No Internet connection when running a Python script
#1
Hello. I have the follwoing Python code that checks a website for change. This script always gives me an error "Error checking website". What am I doing wrong?

import requests
import os
from bs4 import BeautifulSoup
import time
import logging
import smtplib as smtp

URL_TO_MONITOR = "https://www.yahoo.com/" #change this to the URL you want to monitor
DELAY_TIME = 15 # seconds

def process_html(string):
soup = BeautifulSoup(string, features="lxml")

    # make the html look good
    soup.prettify()
    
    # remove script tags
    for s in soup.select('script'):
        s.extract()
    
    # remove meta tags 
    for s in soup.select('meta'):
        s.extract()
    
    # convert to a string, remove '\r', and return
    return str(soup).replace('\r', '')

def webpage_was_changed():
"""Returns true if the webpage was changed, otherwise false."""
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36',
'Pragma': 'no-cache', 'Cache-Control': 'no-cache'}
response = requests.get(URL_TO_MONITOR, headers=headers)

    # create the previous_content.txt if it doesn't exist
    if not os.path.exists("previous_content.txt"):
        open("previous_content.txt", 'w+').close()
    
    filehandle = open("previous_content.txt", 'r')
    previous_response_html = filehandle.read() 
    filehandle.close()
    
    processed_response_html = process_html(response.text)
    
    if processed_response_html == previous_response_html:
        return False
    else:
        filehandle = open("previous_content.txt", 'w')
        filehandle.write(processed_response_html)
        filehandle.close()
        return True

def main():
log = logging.getLogger(__name__)
logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"), format='%(asctime)s %(message)s')
log.info("Running Website Monitor")
while True:
try:
if webpage_was_changed():
log.info("WEBPAGE WAS CHANGED.")

                print("The website was changed")
            else:
                log.info("Webpage was not changed.")
        except:
            log.info("Error checking website.")
        time.sleep(DELAY_TIME)

if __name__ == "__main__":
main()
Reply
#2
The indentation is a mess. Fix that in your post. Also don't use bare except. Remove the try/except to get meaningful error message na debug properly
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
I suggest not using try/except at all. That will provide more information about the problem.
likes this post
Reply
#4
import requests
import os
from bs4 import BeautifulSoup
import time
import logging
import smtplib as smtp

try:
    import lxml
except ImportError:
    raise RuntimeError("Please install lxml")


URL_TO_MONITOR = "https://www.yahoo.com/"  # change this to the URL you want to monitor
DELAY_TIME = 15  # seconds


def process_html(string):
    soup = BeautifulSoup(string, features="lxml")

    # make the html look good
    soup.prettify()

    # remove script tags
    for s in soup.select("script"):
        s.extract()

    # remove meta tags
    for s in soup.select("meta"):
        s.extract()

    # convert to a string, remove '\r', and return
    return str(soup).replace("\r", "")


def webpage_was_changed():
    """Returns true if the webpage was changed, otherwise false."""
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36",
        "Pragma": "no-cache",
        "Cache-Control": "no-cache",
    }
    response = requests.get(URL_TO_MONITOR, headers=headers)

    # create the previous_content.txt if it doesn't exist
    if not os.path.exists("previous_content.txt"):
        open("previous_content.txt", "w+").close()

    filehandle = open("previous_content.txt", "r")
    previous_response_html = filehandle.read()
    filehandle.close()

    processed_response_html = process_html(response.text)

    if processed_response_html == previous_response_html:
        return False
    else:
        filehandle = open("previous_content.txt", "w")
        filehandle.write(processed_response_html)
        filehandle.close()
        return True


def main():
    log = logging.getLogger(__name__)
    logging.basicConfig(
        level=os.environ.get("LOGLEVEL", "INFO"), format="%(asctime)s %(message)s"
    )
    log.info("Running Website Monitor")

    while True:
        try:
            if webpage_was_changed():
                log.info("WEBPAGE WAS CHANGED.")
            else:
                log.info("Webpage was not changed.")

        except Exception as e:
            log.exception(e)

        time.sleep(DELAY_TIME)


if __name__ == "__main__":
    main()
Do not use a bare except. It suppresses programming errors. In this case lxml wasn't installed, which is required by bs4 (explicit features="lxml").

I didn't check the other stuff, just used an format tool (ruff format).
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
(Mar-10-2024, 05:53 PM)DeaD_EyE Wrote: In this case lxml wasn't installed,
well, I don't know how you know it is not installed on OP machine
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#6
(Mar-10-2024, 07:28 PM)buran Wrote: well, I don't know how you know it is not installed on OP machine

I tested his code and ran into this issue. lxml was not installed.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#7
(Mar-10-2024, 08:49 PM)DeaD_EyE Wrote: I tested his code and ran into this issue. lxml was not installed.
It was not installed on YOUR machine. There is no info about OP setup - may or may not be installed, we just don't know
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#8
(Mar-10-2024, 10:11 PM)buran Wrote: It was not installed on YOUR machine. There is no info about OP setup - may or may not be installed, we just don't know

This is why I mentioned, that bare excepts are bad. Further, I had this issue with lxml, which is also eaten up by the bare except. The exception was raised by bs4. If his internet connection does not work, then he will see it with the modified code:

        except Exception as e:
            log.exception(e)
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#9
Some improvement,like time.sleep(blocking) is not the best for schedule stuff.
So schedule and loguru(great) for logging.
import requests
import os
from bs4 import BeautifulSoup
import time
from loguru import logger
logger.add("log_file.log", rotation="2 days")
import schedule
try:
    from lxml import etree
except ImportError:
    raise RuntimeError("Please install lxml with `pip install lxml`")

URL_TO_MONITOR = "https://hckrnews.com/"
CHECK_INTERVAL = 15

def process_html(site_content):
    soup = BeautifulSoup(site_content, features="lxml")
    # Combining tag selections
    for s in soup(["script", "meta"]):
        s.extract()
    return str(soup).replace("\r", "")

def webpage_was_changed():
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36",
        "Pragma": "no-cache",
        "Cache-Control": "no-cache",
    }
    response = requests.get(URL_TO_MONITOR, headers=headers)
    if not os.path.exists("previous_content.txt"):
        open("previous_content.html", "w+").close()
    with open("previous_content.html", "r+") as filehandle:
        previous_response_html = filehandle.read()
        processed_response_html = process_html(response.content)
        if processed_response_html != previous_response_html:
            filehandle.seek(0)
            filehandle.write(processed_response_html)
            filehandle.truncate()
            return True
    return False

def check_webpage():
    try:
        if webpage_was_changed():
            logger.info("WEBPAGE WAS CHANGED.")
        else:
            logger.info("Webpage was not changed.")
    except Exception as e:
        logger.exception(e)

def main():
    schedule.every(CHECK_INTERVAL).seconds.do(check_webpage)
    logger.info("Running Website Monitor")
    while True:
        schedule.run_pending()
        time.sleep(1)

if __name__ == "__main__":
    main()
Also a tips i would say that soup.prettify() is broken,make new lines in tag so dos look like standard HTML at all.
Use Prettier have a command line tool so do just prettier --write . in folder then get correct formatted HTML.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  I don't know what is wrong (Python and SQL connection) shereen 3 374 Apr-01-2024, 08:56 AM
Last Post: Pedroski55
Question Running Python script through Task Scheduler? Winfried 8 531 Mar-10-2024, 07:24 PM
Last Post: Winfried
  Connection LTspice-Python with PyLTSpice bartel90 0 359 Feb-05-2024, 11:46 AM
Last Post: bartel90
  Virtual Env changing mysql connection string in python Fredesetes 0 386 Dec-20-2023, 04:06 PM
Last Post: Fredesetes
  connection python and SQL dawid294 4 685 Dec-12-2023, 08:22 AM
Last Post: Pedroski55
  Help Running Python Script in Mac OS emojistickers 0 353 Nov-20-2023, 01:58 PM
Last Post: emojistickers
  Trying to make a board with turtle, nothing happens when running script Quascia 3 686 Nov-01-2023, 03:11 PM
Last Post: deanhystad
  Is there a *.bat DOS batch script to *.py Python Script converter? pstein 3 3,293 Jun-29-2023, 11:57 AM
Last Post: gologica
  Python script running under windows over nssm.exe JaroslavZ 0 727 May-12-2023, 09:22 AM
Last Post: JaroslavZ
  Networking Issues - Python GUI client and server connection always freezes Veritas_Vos_Liberabit24 0 742 Mar-21-2023, 03:18 AM
Last Post: Veritas_Vos_Liberabit24

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020