Getting past a none type error

Getting past a none type error - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Getting past a none type error (/thread-4166.html)

Getting past a none type error - CodyW129 - Jul-27-2017

Hi Everyone, I am in the early stages of creating a Python web-scrapping script that will allow me to retrieve suspected spam accounts on a forum I help run and then add the information to a CSV file for later analysis in Excel. In slightly greater detail, 99% of the spam accounts on the forum have some form of link in signature on their profile. They don’t actively spam the forum, they just sit there existing in hope that someone will come by and click on the link in their profile – and there are / a lot / of these accounts that have been added since around 2012. If the difference between their signup date and their last log-in is less than or equal to one and a link in the signature div class has been found, the script will retrieve their user ID, username and the contents of the signature div class and write the information to the CSV file. THE PROBLEMI’m building this script up slowly. The first problem that I’ve encountered is that if a profile doesn’t exist, a datatype of None is returned and the for loop will stop iterating. If None is returned; I want it to move on to the next user ID (UID) but I haven’t had much luck trying to fix it.

import requests
from bs4 import BeautifulSoup

UID_start = 58217
UID_end = 58221

for UID in range(UID_start, UID_end):

    page = requests.get("http forum.shipspotting.com/index.php?action=profile;u=" + str(UID)) #I have altered the link for this post to work
    soup = BeautifulSoup(page.content, 'html.parser')

    for a in soup.find("div", "signature"):
        if a is not None:
            print(a)
        else:
            UID_start += 1

Error:
Traceback (most recent call last):  File "/Users/Cody/PycharmProjects/Webcrawl/webcrawl.py", line 12, in <module>    for a in soup.find("div", "signature"):TypeError: 'NoneType' object is not iterable Process finished with exit code 1

RE: Getting past a none type error - buran - Jul-27-2017

There are two possible approaches

One approach is EAFP - Easier to ask for forgiveness than permission
Note the try/except block to handle the error

import requests
import time
from bs4 import BeautifulSoup
 
UID_start = 58217
UID_end = 58221
 
for UID in range(UID_start, UID_end):
    print UID
    page = requests.get("http://forum.shipspotting.com/index.php?action=profile;u=" + str(UID)) #I have altered the link for this post to work
    soup = BeautifulSoup(page.content, 'html.parser')
    div = soup.find("div", {'class':'content'})
    signature = div.find("div", {'class':"signature"})
    user = div.find('tr', {'class':'titlebg'}).find('td').text.split(' ')[-1].strip()
    try:
        print('User: {}\nSigniture: {}'.format(user, signature.text))
    except AttributeError:
        print(div.find('tr', {'class':'windowbg'}).text.strip())
    print
    time.sleep(1) # add some sleep between requests

The otehr one is LBYL - Look before you leap
In this case check what you work with

import requests
import time
from bs4 import BeautifulSoup
 
UID_start = 58217
UID_end = 58221
 
for UID in range(UID_start, UID_end):
    print UID
    page = requests.get("http://forum.shipspotting.com/index.php?action=profile;u=" + str(UID)) #I have altered the link for this post to work
    soup = BeautifulSoup(page.content, 'html.parser')
    div = soup.find("div", {'class':'content'})
    signature = div.find("div", {'class':"signature"})
    if signature is not None:
        user = div.find('tr', {'class':'titlebg'}).find('td').text.split(' ')[-1].strip()
        print('User: {}\nSigniture: {}'.format(user, signature.text))
    else:
        print(div.find('tr', {'class':'windowbg'}).text.strip())
    print
    time.sleep(1) # add some sleep between requests

in both cases the output is

Output:58217
User: nikond3100cam2
Signiture: <a href="http://www.nikon-d3100.com">Nikon D3100</a> - Digital Camera
D3100 NikonNikon DSLR D3100Nikon D3100Nik

58218
User: alvinswaim
Signiture:

58219
The user whose profile you are trying to view does not exist.

58220
The user whose profile you are trying to view does not exist.