Python Forum
Web Scraper with BeautifulSoup4 sometimes no Output - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Web Scraper with BeautifulSoup4 sometimes no Output (/thread-35643.html)



Web Scraper with BeautifulSoup4 sometimes no Output - Nitsuj - Nov-25-2021

Hello,

I need some help with my Python-Script:
I want to make a web scraper to scrape some prices from this website:
https://www.medizinfuchs.de/?params%5Bsearch%5D=10714367&params%5Bsearch_cat%5D=1
I wrote following code:
from bs4 import BeautifulSoup
import requests

URL = "https://www.medizinfuchs.de/?params%5Bsearch%5D=10714367&params%5Bsearch_cat%5D=1"

my_headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36", "Accept":"text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,image/apng,*/*;q=0.8"}
  
page = requests.get(URL, headers=my_headers)

soup = BeautifulSoup(page.text, "lxml")

for price in soup.select("ul.Apothekenliste div.price"):
        print(float(price.text.strip(' \t\n€').replace(',', '.')))
It works sometimes - but too inconsistent.
I really don't know what I should be doing different.

Thanks for your help!


RE: Web Scraper with BeautifulSoup4 sometimes no Output - ghoul - Nov-26-2021

Well, for me it does not seem to work at all. Dodgy


RE: Web Scraper with BeautifulSoup4 sometimes no Output - snippsat - Nov-26-2021

It work something but not stable,like this it's a little more stable.
It may bye more stable using Selenium in headless mode,
but code under should work ok.
from bs4 import BeautifulSoup
import requests
import time

URL = "https://www.medizinfuchs.de/?params%5Bsearch%5D=10714367&params%5Bsearch_cat%5D=1"
my_headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,image/apng,*/*;q=0.8",
}
page = requests.get(URL, headers=my_headers)
soup = BeautifulSoup(page.text, "lxml")
time.sleep(5)
price_lst = soup.find_all('div', class_="col-xs-24 single")
for price in price_lst:
    print(price.text.strip())
 
Output:
6,81 € 6,84 € 7,14 € 7,23 € 7,36 € 7,39 € 7,53 € 7,60 €
I think is want continuous checker over time should use something like schedule,or set it up schedule at OS level.