web crawler that retrieves data not stored in source code

edithegodfather · (This post was last modified: Jan-08-2017, 12:56 PM by Ofnuts.)

yeah, i checked the page source and it works just fine; i don't think there's gonna be much variation in there, unless they change the whole layout of the website but in that case it's not just gonna be 1 tag that doesn't match. :D

anyway, now i got the link part, the title part and the ad id part and all i need to do is convert the adid into views.

i'm using the code you guys gave me:

# adid is 18238521
# views is 4
href = 'http://www.publi24.ro/anunturi/imobiliare/de-vanzare/apartamente/garsoniera/anunt/Garsoniera-Sector-1/7b006674706c6156.html'

def get_adid(item_url):
    source_code = requests.get(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, 'html.parser')
    for link in soup.findAll('span', {'class':'fa fa-eye'}):
        adid = link.get('ng-init')
        num = adid.split('=')[1]
        print(num)

def views(item_url):
    browser = webdriver.PhantomJS(r'D:\phantomjs-2.1.1-windows\bin\phantomjs.exe')
    browser.get(item_url)
    time.sleep(0)
    soup = BeautifulSoup(browser.page_source, 'html.parser')
    tag = soup.find('span', {'add-view':get_adid(href)})
    print(tag.text)
    browser.quit()

but i'm not sure how to pass the adid from the get_adid() function into the dictionary from the 'tag' variable from the views() function. i tried putting it in there but it just prints the adid instead.
i thought about zipping 2 lists together but it doesn't print out a dictionary and both lists have to be the same length whereas here i'm trying to fit 'add-view' with whatever id i can get in there.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Hide source code from python process itself	xmghe	2	1,884	Jan-27-2021, 04:04 PM Last Post: xmghe
	Web Crawler help	Mr_Mafia	2	1,900	Apr-04-2020, 07:20 PM Last Post: Mr_Mafia
	scraping from a website that hides source code	PIWI_Protein	1	1,973	Mar-27-2020, 05:08 PM Last Post: Larz60+
	Web Crawler help	takaa	39	27,286	Apr-26-2019, 12:14 PM Last Post: stateitreal
	Python requests.get() returns broken source code instead of expected source code?	FatalPythonError	3	3,734	Sep-21-2018, 02:46 PM Last Post: nilamo

web crawler that retrieves data not stored in source code

User Panel Messages

Announcements