Python Forum
Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web Crawler help
#36
It would be something along the lines of....
#import requests
from selenium import webdriver
#from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
#open('output.csv', 'w').close()
import re
import time

#browser = webdriver.Firefox()
browser = webdriver.Chrome('/home/metulburr/chromedriver')
browser.set_window_position(0,0)
  
def fundaSpider(max_pages):
    page = 1
    while page <= max_pages:

        url = 'http://www.funda.nl/koop/rotterdam/p{}'.format(page)
        browser.get(url)
        time.sleep(1)#normal delay to allow browser to load content
        raw_input('Press Enter after bypassing Captcha')
        soup = BeautifulSoup(browser.page_source, 'html.parser')
        ads = soup.find_all('li', {'class': 'search-result'})
        for ad in ads:
            title = ad.find('h3')
            print(title)
        page += 1
        
fundaSpider(2)
 couple of things.....
- i used chrome as i dont have the firefox driver on my system.
- i repositioned the window to the top left, because i have dual monitors and it puts it on my TV when i run it if i dont put it there
- you can use PhantomJS to keep it in the background instead of popping up a browser.
- i kept trying to bring up the captcha, and this time i didnt get one, so i didnt know exactly what occurs after the captcha is entered, or how often it occurs.....ITs just an input to stop the program until the captcha is entered, currently is placed for every page. This is assuming the captchas gets triggered on every page. If you only get the captcha on the first time, you can move the input out of the while loop, however you are going to need to a do a captcha trigger get()
such as....
browser.get(url) #trigger captcha by going to the first page
raw_input() #halts program until after captcha is entered
...
while page <= max_pages:
    ...
    browser.get(url) # now go to true page as captcha will not be triggered
    ...
Recommended Tutorials:
Reply


Messages In This Thread
Web Crawler help - by takaa - Feb-06-2017, 06:57 PM
RE: Web Crawler help - by wavic - Feb-06-2017, 08:53 PM
RE: Web Crawler help - by metulburr - Feb-06-2017, 08:57 PM
RE: Web Crawler help - by takaa - Feb-07-2017, 08:46 AM
RE: Web Crawler help - by wavic - Feb-07-2017, 09:46 AM
RE: Web Crawler help - by takaa - Feb-07-2017, 05:17 PM
RE: Web Crawler help - by snippsat - Feb-07-2017, 05:45 PM
RE: Web Crawler help - by metulburr - Feb-07-2017, 05:53 PM
RE: Web Crawler help - by takaa - Feb-07-2017, 10:12 PM
RE: Web Crawler help - by metulburr - Feb-08-2017, 02:33 AM
RE: Web Crawler help - by takaa - Feb-08-2017, 12:22 PM
RE: Web Crawler help - by takaa - Feb-08-2017, 01:31 PM
RE: Web Crawler help - by wavic - Feb-08-2017, 01:47 PM
RE: Web Crawler help - by snippsat - Feb-08-2017, 02:19 PM
RE: Web Crawler help - by takaa - Feb-09-2017, 11:16 AM
RE: Web Crawler help - by metulburr - Feb-09-2017, 12:07 PM
RE: Web Crawler help - by takaa - Feb-09-2017, 12:08 PM
RE: Web Crawler help - by Larz60+ - Feb-09-2017, 12:10 PM
RE: Web Crawler help - by metulburr - Feb-09-2017, 12:14 PM
RE: Web Crawler help - by takaa - Feb-10-2017, 12:24 PM
RE: Web Crawler help - by metulburr - Feb-10-2017, 01:06 PM
RE: Web Crawler help - by takaa - Feb-14-2017, 01:49 PM
RE: Web Crawler help - by metulburr - Feb-14-2017, 02:43 PM
RE: Web Crawler help - by takaa - Feb-14-2017, 02:54 PM
RE: Web Crawler help - by takaa - Feb-15-2017, 11:02 AM
RE: Web Crawler help - by metulburr - Feb-15-2017, 01:18 PM
RE: Web Crawler help - by takaa - Feb-15-2017, 01:46 PM
RE: Web Crawler help - by snippsat - Feb-15-2017, 03:48 PM
RE: Web Crawler help - by takaa - Feb-15-2017, 04:01 PM
RE: Web Crawler help - by metulburr - Feb-15-2017, 06:03 PM
RE: Web Crawler help - by takaa - Feb-20-2017, 03:10 PM
RE: Web Crawler help - by metulburr - Feb-20-2017, 05:52 PM
RE: Web Crawler help - by takaa - Feb-20-2017, 07:56 PM
RE: Web Crawler help - by metulburr - Feb-21-2017, 02:18 AM
RE: Web Crawler help - by takaa - Mar-04-2017, 07:42 PM
RE: Web Crawler help - by metulburr - Mar-05-2017, 01:12 AM
RE: Web Crawler help - by Stoss - Jan-28-2019, 12:39 PM
RE: Web Crawler help - by takaa - Jan-30-2019, 08:35 AM
RE: Web Crawler help - by metulburr - Jan-30-2019, 06:23 PM
RE: Web Crawler help - by stateitreal - Apr-26-2019, 12:14 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Web Crawler help Mr_Mafia 2 2,047 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020