![]() |
Web Scraper with BeautifulSoup4 sometimes no Output - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Web Scraper with BeautifulSoup4 sometimes no Output (/thread-35643.html) |
Web Scraper with BeautifulSoup4 sometimes no Output - Nitsuj - Nov-25-2021 Hello, I need some help with my Python-Script: I want to make a web scraper to scrape some prices from this website: https://www.medizinfuchs.de/?params%5Bsearch%5D=10714367¶ms%5Bsearch_cat%5D=1 I wrote following code: from bs4 import BeautifulSoup import requests URL = "https://www.medizinfuchs.de/?params%5Bsearch%5D=10714367¶ms%5Bsearch_cat%5D=1" my_headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36", "Accept":"text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,image/apng,*/*;q=0.8"} page = requests.get(URL, headers=my_headers) soup = BeautifulSoup(page.text, "lxml") for price in soup.select("ul.Apothekenliste div.price"): print(float(price.text.strip(' \t\n€').replace(',', '.')))It works sometimes - but too inconsistent. I really don't know what I should be doing different. Thanks for your help! RE: Web Scraper with BeautifulSoup4 sometimes no Output - ghoul - Nov-26-2021 Well, for me it does not seem to work at all. ![]() RE: Web Scraper with BeautifulSoup4 sometimes no Output - snippsat - Nov-26-2021 It work something but not stable,like this it's a little more stable. It may bye more stable using Selenium in headless mode, but code under should work ok. from bs4 import BeautifulSoup import requests import time URL = "https://www.medizinfuchs.de/?params%5Bsearch%5D=10714367¶ms%5Bsearch_cat%5D=1" my_headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36", "Accept": "text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,image/apng,*/*;q=0.8", } page = requests.get(URL, headers=my_headers) soup = BeautifulSoup(page.text, "lxml") time.sleep(5) price_lst = soup.find_all('div', class_="col-xs-24 single") for price in price_lst: print(price.text.strip()) I think is want continuous checker over time should use something like schedule,or set it up schedule at OS level.
|