Feb-20-2017, 03:10 PM
Since today I have an issue. The BeautifulSoup is not showing all the HTML code of the requested page anymore.
If i print
If i print
print(soup)I am not getting all the code that I see when I "Inspect source code" of the web page. Before today this was the same and i had no problem running my code. If i now run the code this is the result:
import requests from bs4 import BeautifulSoup import re def fundaSpider(max_pages): page = 1 while page <= max_pages: url = 'http://www.funda.nl/koop/rotterdam/p{}'.format(page) source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text, 'html.parser') ads = soup.find_all('li', {'class': 'search-result'}) print(ads) page += 1 fundaSpider(1)
Output:[]
Process finished with exit code 0
In my web browser I have no problem accessing the web page. Is it possible that the website is blocking the crawler, but not me as a person? Is there any way I can keep running the crawler? (just for the record, I use this crawler only for personal use and run it a few times per week).