web crawler that retrieves data not stored in source code

edithegodfather · (This post was last modified: Jan-05-2017, 08:52 PM by snippsat.)

hey, thanks a lot, that actually worked :D
i did make some changes though to adapt the code to my system:

from bs4 import BeautifulSoup
from selenium import webdriver
import time
import os

url = 'http://www.publi24.ro/anunturi/locuri-de-munca/anunt/Echipa-Tehnician-Alpinist-Telecom/7b00667478616b51.html'

def setup():
   '''
   setup webdriver and create browser
   '''
   # https://chromedriver.storage.googleapis.com/index.html
   # https://chromedriver.storage.googleapis.com/index.html?path=2.25/ ##latest
   chromedriver = "D:\chromedriver_win32\chromedriver.exe"  # the path to the chromedriver
   os.environ["webdriver.chrome.driver"] = chromedriver
   browser = webdriver.Chrome(chromedriver)
   return browser

browser = setup()
browser.get(url)
time.sleep(0)

soup = BeautifulSoup(browser.page_source, 'html.parser')
tag = soup.find('span', {'add-view': '18230886'})
print(tag.text)
browser.quit()

i changed the location from "/home/metulburr/chromedriver" to where i had chromedriver.exe
changed the time.sleep from 2 to 0 to see if i could make it run faster and it worked
and i also changed the parser from lxml to html.parser because i was getting some errors and i managed to get rid of them with that.

now i'm gonna figure out how to extract all the adid's and run them through the soup.find tag and print each ad link with the number of views.

say, is there any way of doing this without having to run a browser window?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Hide source code from python process itself	xmghe	2	1,884	Jan-27-2021, 04:04 PM Last Post: xmghe
	Web Crawler help	Mr_Mafia	2	1,898	Apr-04-2020, 07:20 PM Last Post: Mr_Mafia
	scraping from a website that hides source code	PIWI_Protein	1	1,972	Mar-27-2020, 05:08 PM Last Post: Larz60+
	Web Crawler help	takaa	39	27,281	Apr-26-2019, 12:14 PM Last Post: stateitreal
	Python requests.get() returns broken source code instead of expected source code?	FatalPythonError	3	3,730	Sep-21-2018, 02:46 PM Last Post: nilamo

web crawler that retrieves data not stored in source code

User Panel Messages

Announcements