Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Challenging BS4 Problem
#1
Hey guys, I'm trying to improve my scraping abilities, so I've been practicing and trying to scrape some tough to obtain data. I've run into issues with the following:

http://www.foxnews.com/tech/2018/01/11/h...tline.html

Elements to scrape:

  1. Article Time Published [data-time-published]
  2. Message Count [param-0 param-messagesCount]
  3. Message Time Stamp [message-timestamp]

The 'time published' is being updated by ajax or js, or some other kind of sorcery. Everything in the comments section appears to be loaded the same way.

So my question is.. how would you guys approach solving this problem? I'm currently using requests. Should I load the page with selenium? Would that make it easier to scrape?

Any advice is greatly appreciated.

[Image: project.jpg]
Reply


Messages In This Thread
Challenging BS4 Problem - by digitalmatic7 - Jan-16-2018, 08:42 AM
RE: Challenging BS4 Problem - by stranac - Jan-16-2018, 09:29 AM
RE: Challenging BS4 Problem - by digitalmatic7 - Jan-16-2018, 01:18 PM
RE: Challenging BS4 Problem - by snippsat - Jan-16-2018, 02:12 PM
RE: Challenging BS4 Problem - by digitalmatic7 - Jan-16-2018, 02:58 PM
RE: Challenging BS4 Problem - by digitalmatic7 - Jan-16-2018, 04:21 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020