Nov-15-2019, 05:38 AM
Hi,
I'd like to scrape string contents from a website, but it didn't work and I don't know how to solve it. Any ideas about it would be much grateful.
Here is detail info:
Website: https://voteview.com/rollcall/RH1030237
Scrape: All the Congressman's "name" "state" "vote" - Last part of the website: Votes (Sort by Party,State,Vote,Ideology,Vote Probability)
Here is my code:
This is my first post here, sorry if this post confuses you and please tell me to improve, many thanks!
I'd like to scrape string contents from a website, but it didn't work and I don't know how to solve it. Any ideas about it would be much grateful.
Here is detail info:
Website: https://voteview.com/rollcall/RH1030237
Scrape: All the Congressman's "name" "state" "vote" - Last part of the website: Votes (Sort by Party,State,Vote,Ideology,Vote Probability)
Here is my code:
import requests import bs4 from bs4 import BeautifulSoup def getHTMLText(url): try: r = requests.get(url) r.encoding = r.apparent_encoding return r.text except: return "" def fillPList(plist, html): soup = BeautifulSoup(html, "html.parser") for li in soup.find('ul'): if isinstance(li, bs4.element.Tag): spans = li('span') plist.append([spans[0].string, spans[1].string, spans[2].string]) def printPList(plist, num): print("{:^10}\t{:^10}\t{:^10}".format("name", "state", "vote")) for i in range(num): p = plist[i] print("{:^10}\t{:^10}\t{:^10}".format(p[0], p[1], p[2])) def main(): pinfo = [] url = 'https://voteview.com/rollcall/RH1030237' html = getHTMLText(url) fillPList(pinfo, html) printPList(pinfo, 435) with open(r'D:\KKKKKK\103_hr1876.csv', 'a', encoding='utf-8') as f: f.write("{},{},{}\n".format("name", "state", "vote")) main()Here is error I got:
Error:Traceback (most recent call last):
File "D:/AA_Software/Pycharm/PycharmProjects/untitled/voteview.py", line 30, in <module>
main()
File "D:/AA_Software/Pycharm/PycharmProjects/untitled/voteview.py", line 26, in main
fillPList(pinfo, html)
File "D:/AA_Software/Pycharm/PycharmProjects/untitled/voteview.py", line 16, in fillPList
plist.append([spans[0].string, spans[1].string, spans[2].string])
IndexError: list index out of range
I have done some work about this error, it's said the list is none, so there will be error when you try to print the list. But there are contents in the website source code. This is my first post here, sorry if this post confuses you and please tell me to improve, many thanks!