Oct-19-2017, 07:36 AM
I'm new to programming and am having trouble scraping with BS4.
I'm webmaster for a popular website (can't share it here, but it uses Disqus comments platform).
I want to scrape the vote count and the message in top comments within a set range.. (scrape comments with 20-200 upvotes).
I noticed that:
I've been playing around with some code working on an example site, but so far no success:
count-20
count-21
count-22
...but sadly it doesn't work.
Can anyone help me understand the proper way?
I'm webmaster for a popular website (can't share it here, but it uses Disqus comments platform).
I want to scrape the vote count and the message in top comments within a set range.. (scrape comments with 20-200 upvotes).
I noticed that:
- Vote count should be easy to scrape since the upvote count can be found in the 'a class', example: "count-116"
- The problem is that 'a class' isn't linked to the message text in any way I can see
I've been playing around with some code working on an example site, but so far no success:
from bs4 import BeautifulSoup import urllib.request import re scrape = urllib.request.urlopen('https://disqus.com/home/discussion/channel-discussdisqus/disqus_leaderboard_what_are_the_best_sports_websites/').read() #soup = BeautifulSoup(scrape,'lxml') soup = BeautifulSoup(scrape, 'html.parser') for elem in soup.find_all('a', src=re.compile('count-116')): print (elem['src'])^ This was my attempt to scrape the 'a' element that contains 'count-116', I was going to run it in a while loop with an increment..
count-20
count-21
count-22
...but sadly it doesn't work.
Can anyone help me understand the proper way?