Python Forum
AttributeError: 'NoneType' object in a parser - stops it
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
AttributeError: 'NoneType' object in a parser - stops it
#1
dear python-experts good day, Smile



I'm currently working on a parser to make a small preview of a page from a URL given by the user in PHP.

I'd like to retrieve only the title of the page and a little chunk of information (a bit of text)

The project: for a list of meta-data of popular wordpress-plugins (cf. https://de.wordpress.org/plugins/browse/popular/ and gathering the first 50 URLs - that are 50 plugins which are of interest! The challenge is: i want to fetch meta-data of all the existing plugins. What i subsequently want to filter out after the fetch is - those plugins that have the newest timestamp - that are updated (most) recently. It is all aobut acutality...

https://wordpress.org/plugins/wp-job-manager
https://wordpress.org/plugins/ninja-forms
https://wordpress.org/plugins/participants-database ....and so on and so forth.



import requests
from bs4 import BeautifulSoup
from concurrent.futures.thread import ThreadPoolExecutor

url = "https://wordpress.org/plugins/browse/popular/{}"


def main(url, num):
    with requests.Session() as req:
        print(f"Collecting Page# {num}")
        r = req.get(url.format(num))
        soup = BeautifulSoup(r.content, 'html.parser')
        link = [item.get("href")
                for item in soup.findAll("a", rel="bookmark")]
        return set(link)


with ThreadPoolExecutor(max_workers=20) as executor:
    futures = [executor.submit(main, url, num)
               for num in [""]+[f"page/{x}/" for x in range(2, 50)]]

allin = []
for future in futures:
    allin.extend(future.result())


def parser(url):
    with requests.Session() as req:
        print(f"Extracting {url}")
        r = req.get(url)
        soup = BeautifulSoup(r.content, 'html.parser')
        target = [item.get_text(strip=True, separator=" ") for item in soup.find(
            "h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]]
        head = [soup.find("h1", class_="plugin-title").text]
        new = [x for x in target if x.startswith(
            ("V", "Las", "Ac", "W", "T", "P"))]
        return head + new


with ThreadPoolExecutor(max_workers=50) as executor1:
    futures1 = [executor1.submit(parser, url) for url in allin]

for future in futures1:
    print(future.result())
    



see the results:

Extracting https://wordpress.org/plugins/tuxedo-big-file-uploads/Extracting https://wordpress.org/plugins/cherry-sidebars/
Extracting https://wordpress.org/plugins/meks-smart-author-widget/
Extracting https://wordpress.org/plugins/wp-limit-login-attempts/

Extracting https://wordpress.org/plugins/automatic-translator-addon-for-loco-translate/
Extracting https://wordpress.org/plugins/event-organiser/
Traceback (most recent call last):

  File "/home/martin/unbenannt0.py", line 45, in <module>
    print(future.result())

  File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()

  File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception

  File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/home/martin/unbenannt0.py", line 34, in parser
    "h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]]

AttributeError: 'NoneType' object has no attribute 'find_next'

    
- well i have a severe error - the


AttributeError: 'NoneType' object has no attribute 'find_next'
    
at the moment i do not know how to fix this - this i goes a bit over my head. 'Any help would appreciated.


btw.- besides this error i want to add a option that gives the results back in CSV-formate

many thanks in advance

yours apollo
Smile
Reply
#2
It looks like soup.find("h3", class_="screen-reader-text") has not found anything.
You could either break this line up and only call find_next if there was a result or use a try/except that captures the AttributeError.
Reply
#3
please show the complete and unaltered error traceback.
It contains valuable process call stack information.

you can surround the offending code with:
try:
    code that causes error
except AttributeError:
    print(f"Attribution error on {some data here}, {whatever else would be of value}, {...}")
    ... whatever action you wish to take here.
Reply
#4
hello dear Larzlo60 and dear Yoriz

first of all many many thanks for the quick reply. well i have to admint that tiis little script goes ovcer my actual knowledge limit

i have no glue what goes wrong here


?S - perhaps i have to have a closer look at the conditions of the script



btw: see here the full output: https://pastebin.com/nBXUW1a1

and here see more .. i hope that helps a bit:

Extracting https://wordpress.org/plugins/automatic-translator-addon-for-loco-translate/
Extracting https://wordpress.org/plugins/wpforo/Extracting https://wordpress.org/plugins/accesspress-social-share/
Extracting https://wordpress.org/plugins/mailoptin/
Extracting https://wordpress.org/plugins/tuxedo-big-file-uploads/

Extracting https://wordpress.org/plugins/post-snippets/
Extracting https://wordpress.org/plugins/woocommerce-payfast-gateway/Extracting https://wordpress.org/plugins/woocommerce-grid-list-toggle/

Extracting https://wordpress.org/plugins/goodbye-captcha/
Extracting https://wordpress.org/plugins/gravity-forms-google-analytics-event-tracking/
Traceback (most recent call last):

  File "/home/martin/dev/wordpress_plugin.py", line 44, in <module>
    print(future.result())

  File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()

  File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception

  File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/home/martin/dev/wordpress_plugin.py", line 33, in parser
    "h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]]

AttributeError: 'NoneType' object has no attribute 'find_next'
i look forward to hear from you again

yours apollo Smile
Reply
#5
Question 
If you want to know more about error handling here's the link for PY docs: https://docs.python.org/3.9/tutorial/errors.html and Corey Shafer has a video on error handling: https://www.youtube.com/watch?v=NIWwJbo-9_8
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  AttributeError: 'ellipsis' object has no attribute 'register_blueprint' Mechanicalpixelz 2 2,355 Dec-29-2021, 01:30 AM
Last Post: Mechanicalpixelz
  BeautifulSoup: 6k records - but stops after parsing 20 lines apollo 0 1,787 May-10-2021, 05:08 PM
Last Post: apollo
  AttributeError: ResultSet object has no attribute 'get_text' KatMac 1 4,336 May-07-2021, 05:32 PM
Last Post: snippsat
  Python 3.9 : BeautifulSoup: 'NoneType' object has no attribute 'text' fudgemasterultra 1 8,813 Mar-03-2021, 09:40 AM
Last Post: Larz60+
  Code stops after 20min+ with no output JacobK 1 1,692 Apr-03-2020, 07:01 PM
Last Post: Larz60+
  AttributeError: 'str' object has no attribute 'xpath' nazmulfinance 4 10,383 Nov-11-2019, 05:15 PM
Last Post: nazmulfinance
  AttributeError: 'str' object has no attribute 'xpath' nazmulfinance 0 3,019 Nov-10-2019, 09:13 PM
Last Post: nazmulfinance
  AttributeError: 'Response' object has no attribute 'replace' Truman 12 23,165 Mar-20-2019, 12:59 AM
Last Post: ichabod801
  AttributeError: 'dict' object has no attribute 'is_active' (PyMongo And Flask) usman 0 4,920 Nov-20-2018, 09:50 PM
Last Post: usman
  BeautifulSoup 'NoneType' object has no attribute 'text' bmccollum 9 14,502 Sep-14-2018, 12:56 PM
Last Post: bmccollum

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020