AttributeError: 'NoneType' object in a parser - stops it - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: AttributeError: 'NoneType' object in a parser - stops it (/thread-33590.html) |
AttributeError: 'NoneType' object in a parser - stops it - apollo - May-09-2021 dear python-experts good day, I'm currently working on a parser to make a small preview of a page from a URL given by the user in PHP. I'd like to retrieve only the title of the page and a little chunk of information (a bit of text) The project: for a list of meta-data of popular wordpress-plugins (cf. https://de.wordpress.org/plugins/browse/popular/ and gathering the first 50 URLs - that are 50 plugins which are of interest! The challenge is: i want to fetch meta-data of all the existing plugins. What i subsequently want to filter out after the fetch is - those plugins that have the newest timestamp - that are updated (most) recently. It is all aobut acutality... https://wordpress.org/plugins/wp-job-manager https://wordpress.org/plugins/ninja-forms https://wordpress.org/plugins/participants-database ....and so on and so forth. import requests from bs4 import BeautifulSoup from concurrent.futures.thread import ThreadPoolExecutor url = "https://wordpress.org/plugins/browse/popular/{}" def main(url, num): with requests.Session() as req: print(f"Collecting Page# {num}") r = req.get(url.format(num)) soup = BeautifulSoup(r.content, 'html.parser') link = [item.get("href") for item in soup.findAll("a", rel="bookmark")] return set(link) with ThreadPoolExecutor(max_workers=20) as executor: futures = [executor.submit(main, url, num) for num in [""]+[f"page/{x}/" for x in range(2, 50)]] allin = [] for future in futures: allin.extend(future.result()) def parser(url): with requests.Session() as req: print(f"Extracting {url}") r = req.get(url) soup = BeautifulSoup(r.content, 'html.parser') target = [item.get_text(strip=True, separator=" ") for item in soup.find( "h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]] head = [soup.find("h1", class_="plugin-title").text] new = [x for x in target if x.startswith( ("V", "Las", "Ac", "W", "T", "P"))] return head + new with ThreadPoolExecutor(max_workers=50) as executor1: futures1 = [executor1.submit(parser, url) for url in allin] for future in futures1: print(future.result()) see the results: Extracting https://wordpress.org/plugins/tuxedo-big-file-uploads/Extracting https://wordpress.org/plugins/cherry-sidebars/ Extracting https://wordpress.org/plugins/meks-smart-author-widget/ Extracting https://wordpress.org/plugins/wp-limit-login-attempts/ Extracting https://wordpress.org/plugins/automatic-translator-addon-for-loco-translate/ Extracting https://wordpress.org/plugins/event-organiser/ Traceback (most recent call last): File "/home/martin/unbenannt0.py", line 45, in <module> print(future.result()) File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 428, in result return self.__get_result() File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/home/martin/unbenannt0.py", line 34, in parser "h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]] AttributeError: 'NoneType' object has no attribute 'find_next'- well i have a severe error - the AttributeError: 'NoneType' object has no attribute 'find_next'at the moment i do not know how to fix this - this i goes a bit over my head. 'Any help would appreciated. btw.- besides this error i want to add a option that gives the results back in CSV-formate many thanks in advance yours apollo RE: AttributeError: 'NoneType' object in a parser - stops it - Yoriz - May-09-2021 It looks like soup.find("h3", class_="screen-reader-text") has not found anything.You could either break this line up and only call find_next if there was a result or use a try/except that captures the AttributeError. RE: AttributeError: 'NoneType' object in a parser - stops it - Larz60+ - May-09-2021 please show the complete and unaltered error traceback. It contains valuable process call stack information. you can surround the offending code with: try: code that causes error except AttributeError: print(f"Attribution error on {some data here}, {whatever else would be of value}, {...}") ... whatever action you wish to take here. RE: AttributeError: 'NoneType' object in a parser - stops it - apollo - May-10-2021 hello dear Larzlo60 and dear Yoriz first of all many many thanks for the quick reply. well i have to admint that tiis little script goes ovcer my actual knowledge limit i have no glue what goes wrong here ?S - perhaps i have to have a closer look at the conditions of the script btw: see here the full output: https://pastebin.com/nBXUW1a1 and here see more .. i hope that helps a bit: Extracting https://wordpress.org/plugins/automatic-translator-addon-for-loco-translate/ Extracting https://wordpress.org/plugins/wpforo/Extracting https://wordpress.org/plugins/accesspress-social-share/ Extracting https://wordpress.org/plugins/mailoptin/ Extracting https://wordpress.org/plugins/tuxedo-big-file-uploads/ Extracting https://wordpress.org/plugins/post-snippets/ Extracting https://wordpress.org/plugins/woocommerce-payfast-gateway/Extracting https://wordpress.org/plugins/woocommerce-grid-list-toggle/ Extracting https://wordpress.org/plugins/goodbye-captcha/ Extracting https://wordpress.org/plugins/gravity-forms-google-analytics-event-tracking/ Traceback (most recent call last): File "/home/martin/dev/wordpress_plugin.py", line 44, in <module> print(future.result()) File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 428, in result return self.__get_result() File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/home/martin/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/home/martin/dev/wordpress_plugin.py", line 33, in parser "h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]] AttributeError: 'NoneType' object has no attribute 'find_next'i look forward to hear from you again yours apollo RE: AttributeError: 'NoneType' object in a parser - stops it - Daring_T - May-28-2021 If you want to know more about error handling here's the link for PY docs: https://docs.python.org/3.9/tutorial/errors.html and Corey Shafer has a video on error handling: https://www.youtube.com/watch?v=NIWwJbo-9_8 |