Getting past a none type error - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Getting past a none type error (/thread-4166.html) |
Getting past a none type error - CodyW129 - Jul-27-2017 Hi Everyone, I am in the early stages of creating a Python web-scrapping script that will allow me to retrieve suspected spam accounts on a forum I help run and then add the information to a CSV file for later analysis in Excel. In slightly greater detail, 99% of the spam accounts on the forum have some form of link in signature on their profile. They don’t actively spam the forum, they just sit there existing in hope that someone will come by and click on the link in their profile – and there are / a lot / of these accounts that have been added since around 2012. If the difference between their signup date and their last log-in is less than or equal to one and a link in the signature div class has been found, the script will retrieve their user ID, username and the contents of the signature div class and write the information to the CSV file. THE PROBLEMI’m building this script up slowly. The first problem that I’ve encountered is that if a profile doesn’t exist, a datatype of None is returned and the for loop will stop iterating. If None is returned; I want it to move on to the next user ID (UID) but I haven’t had much luck trying to fix it. import requests from bs4 import BeautifulSoup UID_start = 58217 UID_end = 58221 for UID in range(UID_start, UID_end): page = requests.get("http forum.shipspotting.com/index.php?action=profile;u=" + str(UID)) #I have altered the link for this post to work soup = BeautifulSoup(page.content, 'html.parser') for a in soup.find("div", "signature"): if a is not None: print(a) else: UID_start += 1
RE: Getting past a none type error - buran - Jul-27-2017 There are two possible approaches One approach is EAFP - Easier to ask for forgiveness than permission Note the try/except block to handle the error import requests import time from bs4 import BeautifulSoup UID_start = 58217 UID_end = 58221 for UID in range(UID_start, UID_end): print UID page = requests.get("http://forum.shipspotting.com/index.php?action=profile;u=" + str(UID)) #I have altered the link for this post to work soup = BeautifulSoup(page.content, 'html.parser') div = soup.find("div", {'class':'content'}) signature = div.find("div", {'class':"signature"}) user = div.find('tr', {'class':'titlebg'}).find('td').text.split(' ')[-1].strip() try: print('User: {}\nSigniture: {}'.format(user, signature.text)) except AttributeError: print(div.find('tr', {'class':'windowbg'}).text.strip()) print time.sleep(1) # add some sleep between requestsThe otehr one is LBYL - Look before you leap In this case check what you work with import requests import time from bs4 import BeautifulSoup UID_start = 58217 UID_end = 58221 for UID in range(UID_start, UID_end): print UID page = requests.get("http://forum.shipspotting.com/index.php?action=profile;u=" + str(UID)) #I have altered the link for this post to work soup = BeautifulSoup(page.content, 'html.parser') div = soup.find("div", {'class':'content'}) signature = div.find("div", {'class':"signature"}) if signature is not None: user = div.find('tr', {'class':'titlebg'}).find('td').text.split(' ')[-1].strip() print('User: {}\nSigniture: {}'.format(user, signature.text)) else: print(div.find('tr', {'class':'windowbg'}).text.strip()) print time.sleep(1) # add some sleep between requestsin both cases the output is
|