Jan-16-2018, 08:42 AM
(This post was last modified: Jan-16-2018, 08:42 AM by digitalmatic7.)
Hey guys, I'm trying to improve my scraping abilities, so I've been practicing and trying to scrape some tough to obtain data. I've run into issues with the following:
http://www.foxnews.com/tech/2018/01/11/h...tline.html
Elements to scrape:
The 'time published' is being updated by ajax or js, or some other kind of sorcery. Everything in the comments section appears to be loaded the same way.
So my question is.. how would you guys approach solving this problem? I'm currently using requests. Should I load the page with selenium? Would that make it easier to scrape?
Any advice is greatly appreciated.
http://www.foxnews.com/tech/2018/01/11/h...tline.html
Elements to scrape:
- Article Time Published [data-time-published]
- Message Count [param-0 param-messagesCount]
- Message Time Stamp [message-timestamp]
The 'time published' is being updated by ajax or js, or some other kind of sorcery. Everything in the comments section appears to be loaded the same way.
So my question is.. how would you guys approach solving this problem? I'm currently using requests. Should I load the page with selenium? Would that make it easier to scrape?
Any advice is greatly appreciated.