May-30-2024, 12:07 PM
im trying to build a web scraper using BS4. i want to filter news articles and only get the href links of all the articles that have above a number of votes in them. for example my code looks something like
import requests from bs4 import BeautifulSoup res = requests.get("https://news.ycombinator.com/") soup = BeautifulSoup(res.text,"html.parser") titles=soup.select(".titleline") votes=soup.select(".score") selected_titles=[] amount_of_votes=int(input("How many votes should the articles have?\n")) for index,vote in enumerate(votes): if int(vote.text.split()[0]) > amount_of_votes: href=titles[index].get('href') print(href) selected_titles.append({"title":titles[index].getText(),"link":href})i am printing the href in the condition to check wether it is working fine. my problem is that all of the href values that are generated are None. i am doing a bootcamp and the instructors code was:
href=titles[index].get('href',None)I have tried both the first way i posted it and the second way but all i get for the href values are None. I want to only get the url links of articles that have more than a number of votes. Please help