Python Forum
Detect comments part using web scraping - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Detect comments part using web scraping (/thread-7639.html)



Detect comments part using web scraping - seco - Jan-18-2018

Hi
I'm following this tutorial to scrape Ajax comments section from a website https://likegeeks.com/python-web-scraping/

I'm using both beautifulSoup and selenium to scrape Ajax comments.
The article shows how to scrape ajax content, but I'm asking:
Is there a way to detect that this section is the content section regardless the scraped website? like an advanced scraping library or so?

Thanks in advance.


RE: Detect comments part using web scraping - buran - Jan-18-2018

(Jan-18-2018, 08:06 PM)seco Wrote: Is there a way to detect that this section is the content section regardless the scraped website?
well, it always depend on website structure. There is no "universal" solution that fit all cases


RE: Detect comments part using web scraping - seco - Jan-18-2018

What about detecting it using AI libraries if there are any?


RE: Detect comments part using web scraping - nilamo - Jan-18-2018

Why do you want the comments? There's rarely anything useful in them.

And there might be a library to scrape the contents. You can brute force it by just iterating over all nodes and getting the text content, but then you'll end up with a lot of extra stuff like navigation and copyright info. Saying "it's not possible" doesn't make sense, as every search engine has been doing it for years, lol


RE: Detect comments part using web scraping - buran - Jan-18-2018

(Jan-18-2018, 09:33 PM)nilamo Wrote: Saying "it's not possible" doesn't make sense, as every search engine has been doing it for years, lol
I didn't say it's not possible, but OP asks for advanced scraping library - that I understand as advanced python library that out of the box support extracting comments from ANY website. What you refer to is much more in the domain of AI and ML than webscrapping. Pardon my skepticism if OP is on the verge of developing the next GOOGLE-killer search engine.


RE: Detect comments part using web scraping - seco - Jan-18-2018

I need to identify links in comments sections for a SEO purpose that's all.
Because I don't know the effectiveness of a link if it's on a comment compared with it on the body of the text.


RE: Detect comments part using web scraping - nilamo - Jan-18-2018

I think you should show an example page of what you're scraping. Html comments are ignored by search engines, and thus don't matter for seo reasons.


RE: Detect comments part using web scraping - seco - Jan-18-2018

Who told that comments are ignored by search engines?