Feb-20-2017, 05:52 PM
if you write the html to a file and open that file in the browser you will see what your crawler is getting
import requests from bs4 import BeautifulSoup import re def fundaSpider(max_pages): page = 1 while page <= max_pages: url = 'http://www.funda.nl/koop/rotterdam/p{}'.format(page) source_code = requests.get(url) plain_text = source_code.text with open('test.html','w') as f: f.write(plain_text.encode('utf-8')) soup = BeautifulSoup(plain_text, 'html.parser') ads = soup.find_all('li', {'class': 'search-result'}) print(ads) page += 1 fundaSpider(1)In my case i am getting a captcha verification. Not sure what is triggering the captcha, but there isnt really a automation method around it....as its purpose to verify a human.
Recommended Tutorials: