Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Getting past a none type error
#1
Hi Everyone, I am in the early stages of creating a Python web-scrapping script that will allow me to retrieve suspected spam accounts on a forum I help run and then add the information to a CSV file for later analysis in Excel. In slightly greater detail, 99% of the spam accounts on the forum have some form of link in signature on their profile. They don’t actively spam the forum, they just sit there existing in hope that someone will come by and click on the link in their profile – and there are / a lot / of these accounts that have been added since around 2012.  If the difference between their signup date and their last log-in is less than or equal to one and a link in the signature div class has been found, the script will retrieve their user ID, username and the contents of the signature div class and write the information to the CSV file.  THE PROBLEMI’m building this script up slowly. The first problem that I’ve encountered is that if a profile doesn’t exist, a datatype of None is returned and the for loop will stop iterating. If None is returned; I want it to move on to the next user ID (UID) but I haven’t had much luck trying to fix it.
import requests
from bs4 import BeautifulSoup

UID_start = 58217
UID_end = 58221

for UID in range(UID_start, UID_end):

    page = requests.get("http forum.shipspotting.com/index.php?action=profile;u=" + str(UID)) #I have altered the link for this post to work
    soup = BeautifulSoup(page.content, 'html.parser')

    for a in soup.find("div", "signature"):
        if a is not None:
            print(a)
        else:
            UID_start += 1
Error:
Traceback (most recent call last):  File "/Users/Cody/PycharmProjects/Webcrawl/webcrawl.py", line 12, in <module>    for a in soup.find("div", "signature"):TypeError: 'NoneType' object is not iterable Process finished with exit code 1
Reply
#2
There are two possible approaches

One approach is EAFP - Easier to ask for forgiveness than permission
Note the try/except block to handle the error

import requests
import time
from bs4 import BeautifulSoup
 
UID_start = 58217
UID_end = 58221
 
for UID in range(UID_start, UID_end):
    print UID
    page = requests.get("http://forum.shipspotting.com/index.php?action=profile;u=" + str(UID)) #I have altered the link for this post to work
    soup = BeautifulSoup(page.content, 'html.parser')
    div = soup.find("div", {'class':'content'})
    signature = div.find("div", {'class':"signature"})
    user = div.find('tr', {'class':'titlebg'}).find('td').text.split(' ')[-1].strip()
    try:
        print('User: {}\nSigniture: {}'.format(user, signature.text))
    except AttributeError:
        print(div.find('tr', {'class':'windowbg'}).text.strip())
    print
    time.sleep(1) # add some sleep between requests
The otehr one is LBYL - Look before you leap
In this case check what you work with

import requests
import time
from bs4 import BeautifulSoup
 
UID_start = 58217
UID_end = 58221
 
for UID in range(UID_start, UID_end):
    print UID
    page = requests.get("http://forum.shipspotting.com/index.php?action=profile;u=" + str(UID)) #I have altered the link for this post to work
    soup = BeautifulSoup(page.content, 'html.parser')
    div = soup.find("div", {'class':'content'})
    signature = div.find("div", {'class':"signature"})
    if signature is not None:
        user = div.find('tr', {'class':'titlebg'}).find('td').text.split(' ')[-1].strip()
        print('User: {}\nSigniture: {}'.format(user, signature.text))
    else:
        print(div.find('tr', {'class':'windowbg'}).text.strip())
    print
    time.sleep(1) # add some sleep between requests
in both cases the output is

Output:
58217 User: nikond3100cam2 Signiture: <a href="http://www.nikon-d3100.com">Nikon D3100</a> - Digital Camera D3100 NikonNikon DSLR D3100Nikon D3100Nik 58218 User: alvinswaim Signiture: 58219 The user whose profile you are trying to view does not exist. 58220 The user whose profile you are trying to view does not exist.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Cannot get selenium to scrap past the first two pages newbie_programmer 0 4,134 Dec-12-2019, 06:19 AM
Last Post: newbie_programmer
  sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type. Prince_Bhatia 3 14,868 Apr-03-2018, 03:40 PM
Last Post: snippsat
  Type Not Found error on python soap call using suds library wellborn 1 4,576 Dec-19-2017, 07:53 PM
Last Post: micseydel

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020