web crawler that retrieves data not stored in source code

***metulburr*** · (This post was last modified: Jan-05-2017, 02:53 AM by metulburr.)

If you inspect the element you are interpreting the source from the browsers eyes...and if its not there with python, then it means its javascript. You would have to get the source with selenium first before handing it off to BeaufitulSoup

for example

from bs4 import BeautifulSoup
from selenium import webdriver
import time
import os

url = 'http://www.publi24.ro/anunturi/locuri-de-munca/anunt/Echipa-Tehnician-Alpinist-Telecom/7b00667478616b51.html'

def setup():
    '''
    setup webdriver and create browser
    '''
    #https://chromedriver.storage.googleapis.com/index.html
    #https://chromedriver.storage.googleapis.com/index.html?path=2.25/ ##latest
    chromedriver = "/home/metulburr/chromedriver" #the path to the chromedriver
    os.environ["webdriver.chrome.driver"] = chromedriver
    browser = webdriver.Chrome(chromedriver)
    return browser
    
browser = setup()
browser.get(url) 
time.sleep(2)

soup = BeautifulSoup(browser.page_source, 'lxml')
tag = soup.find('span', {'add-view':'18230886'})
print(tag.text)
browser.quit()

Output:$ python test.py
16

Although this will pop a browser up for a couple seconds. IF you want you can use a headless browser to keep it in the background.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Hide source code from python process itself	xmghe	2	1,884	Jan-27-2021, 04:04 PM Last Post: xmghe
	Web Crawler help	Mr_Mafia	2	1,899	Apr-04-2020, 07:20 PM Last Post: Mr_Mafia
	scraping from a website that hides source code	PIWI_Protein	1	1,972	Mar-27-2020, 05:08 PM Last Post: Larz60+
	Web Crawler help	takaa	39	27,282	Apr-26-2019, 12:14 PM Last Post: stateitreal
	Python requests.get() returns broken source code instead of expected source code?	FatalPythonError	3	3,733	Sep-21-2018, 02:46 PM Last Post: nilamo

web crawler that retrieves data not stored in source code

User Panel Messages

Announcements