Python Forum
web crawler that retrieves data not stored in source code
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
web crawler that retrieves data not stored in source code
#5
hey, thanks a lot, that actually worked :D
i did make some changes though to adapt the code to my system:
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import os

url = 'http://www.publi24.ro/anunturi/locuri-de-munca/anunt/Echipa-Tehnician-Alpinist-Telecom/7b00667478616b51.html'

def setup():
   '''
   setup webdriver and create browser
   '''
   # https://chromedriver.storage.googleapis.com/index.html
   # https://chromedriver.storage.googleapis.com/index.html?path=2.25/ ##latest
   chromedriver = "D:\chromedriver_win32\chromedriver.exe"  # the path to the chromedriver
   os.environ["webdriver.chrome.driver"] = chromedriver
   browser = webdriver.Chrome(chromedriver)
   return browser

browser = setup()
browser.get(url)
time.sleep(0)

soup = BeautifulSoup(browser.page_source, 'html.parser')
tag = soup.find('span', {'add-view': '18230886'})
print(tag.text)
browser.quit()
i changed the location from "/home/metulburr/chromedriver" to where i had chromedriver.exe
changed the time.sleep from 2 to 0 to see if i could make it run faster and it worked
and i also changed the parser from lxml to html.parser because i was getting some errors and i managed to get rid of them with that.

now i'm gonna figure out how to extract all the adid's and run them through the soup.find tag and print each ad link with the number of views.

say, is there any way of doing this without having to run a browser window?
Reply


Messages In This Thread
RE: web crawler that retrieves data not stored in source code - by edithegodfather - Jan-05-2017, 08:40 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Hide source code from python process itself xmghe 2 1,884 Jan-27-2021, 04:04 PM
Last Post: xmghe
  Web Crawler help Mr_Mafia 2 1,898 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia
  scraping from a website that hides source code PIWI_Protein 1 1,972 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Web Crawler help takaa 39 27,281 Apr-26-2019, 12:14 PM
Last Post: stateitreal
  Python requests.get() returns broken source code instead of expected source code? FatalPythonError 3 3,730 Sep-21-2018, 02:46 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020