Python Forum

Full Version: Detect comments part using web scraping
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi
I'm following this tutorial to scrape Ajax comments section from a website https://likegeeks.com/python-web-scraping/

I'm using both beautifulSoup and selenium to scrape Ajax comments.
The article shows how to scrape ajax content, but I'm asking:
Is there a way to detect that this section is the content section regardless the scraped website? like an advanced scraping library or so?

Thanks in advance.
(Jan-18-2018, 08:06 PM)seco Wrote: [ -> ]Is there a way to detect that this section is the content section regardless the scraped website?
well, it always depend on website structure. There is no "universal" solution that fit all cases
What about detecting it using AI libraries if there are any?
Why do you want the comments? There's rarely anything useful in them.

And there might be a library to scrape the contents. You can brute force it by just iterating over all nodes and getting the text content, but then you'll end up with a lot of extra stuff like navigation and copyright info. Saying "it's not possible" doesn't make sense, as every search engine has been doing it for years, lol
(Jan-18-2018, 09:33 PM)nilamo Wrote: [ -> ]Saying "it's not possible" doesn't make sense, as every search engine has been doing it for years, lol
I didn't say it's not possible, but OP asks for advanced scraping library - that I understand as advanced python library that out of the box support extracting comments from ANY website. What you refer to is much more in the domain of AI and ML than webscrapping. Pardon my skepticism if OP is on the verge of developing the next GOOGLE-killer search engine.
I need to identify links in comments sections for a SEO purpose that's all.
Because I don't know the effectiveness of a link if it's on a comment compared with it on the body of the text.
I think you should show an example page of what you're scraping. Html comments are ignored by search engines, and thus don't matter for seo reasons.
Who told that comments are ignored by search engines?