Python Forum
web crawler that retrieves data not stored in source code
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
web crawler that retrieves data not stored in source code
#1
hey guys,

so i'm working right now on a small program that can help me gather ads from an ad website in my country (i'm gonna post some screenshots here; i'm not sure if i can post the website link but if that's ok i'll post that as well).

image link: i.redd.it/u0g9bjzb7l7y.png with https: in front of it

thing about this website, it's a pretty basic craigslist type of site with all the familiar categories you would expect on an ads page.
now the reason why i'm working on this is because this page, the way it works is, it allows you to post ads but it doesn't give you the option of sorting them by number of views, which is handy if you wanna find out which are the newest ads. . thing is, if you wanna sort the pages by date it does give you that option but it doesn't order them by the date when they were created, instead it orders them by the date an ad was updated. so let's say if an ad was 10 years old and you posted one today and that ad updated tomorrow, your ad would come second to the one that just updated even though it's newer. and for this particular scenario the number of views is perfect for determining an ad's age.

so i've been following the new boston's youtube python35 tutorials and i've managed to make the crawler grab the links of all the ads that are running in a certain category (i've used beautifulsoup4 and import modules) and it works like a beauty but then, when i get to the part where i try to have every link from every ad that's when i get into trouble. if i do the exact same thing i get noghing. initially i figured i was doing something wrong and it turned out i was, because what bs4 and import do is get all the data in the source code and put it in a file that python works with, but problem is, now, if you inspect tha page of an ad you'll see the number of views and a span tag that has "add-views=[ad id]" and then the number of views. but python still doesn't show the number of views of each ad, it just doesn't show anything. so i went into the source code itself and sure enough the number of views wasn't there either. so it seems that the viewcount is not stored on the page source but it's somewhere else and that's what i'm trying to figure out.

any ideas how to access the view number? let me know if you need links to the page or my source code.

thanks
Reply


Messages In This Thread
web crawler that retrieves data not stored in source code - by edithegodfather - Jan-05-2017, 12:09 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Hide source code from python process itself xmghe 2 1,829 Jan-27-2021, 04:04 PM
Last Post: xmghe
  Web Crawler help Mr_Mafia 2 1,845 Apr-04-2020, 07:20 PM
Last Post: Mr_Mafia
  scraping from a website that hides source code PIWI_Protein 1 1,931 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Web Crawler help takaa 39 26,851 Apr-26-2019, 12:14 PM
Last Post: stateitreal
  Python requests.get() returns broken source code instead of expected source code? FatalPythonError 3 3,676 Sep-21-2018, 02:46 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020