Custom method to handle exceptions not working as expected

Custom method to handle exceptions not working as expected - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Custom method to handle exceptions not working as expected (/thread-39029.html)

Custom method to handle exceptions not working as expected - gradlon93 - Dec-22-2022

Good afternoon,

I am working on a web scraping project using BeautifulSoup and, on occasion, Selenium webdriver.

The method .findAll(...) by BeautifulSoup always returns a list, but its content is not certain and might be also empty.
I thought to write a custom method to handle IndexError without using a try/except every time I need to scrape something.

        def get_element(element, idx_error=None) -> Any:
            # TODO: This method does not work, idx error is not handled. FIX IT AS SOON AS POSSIBLE!!!
            value_ = None
            try:
                value_ = element

            except IndexError:
                if inspect.isfunction(idx_error):
                    idx_error()
                else:
                    return idx_error

Basically, I want to return value_ ONLY if exists; if the operation raises an IndexError exception, then I want the idx_error value to be returned, or run in case it is a function/method.
These data will be then used for a GUI, therefore I would rather get a non-significant value (ex: 'untitled') rather than a blocking error.

Example usage:

title_selector = content.findAll('h1', attrs={"itemprop": "title"})  # Returns a list of page elements
title = get_element(title_selector[0].text, 'Untitled') # Returns first element in title_selector if there is one; if IndexError returns 'Untitled'

Which should be the equivalent of this:

title_selector = content.findAll('h1', attrs={"itemprop": "title"})
title = None
try:
        title = title_selector[0].text
except IndexError:
        title = 'Untitled'

Unfortunately, for some reason I can't figure out, it just doesn't do anything, and the IndexError is thrown anyways.

Any ideas?

RE: Custom method to handle exceptions not working as expected - Yoriz - Dec-22-2022

This won't work because in the line

title = get_element(title_selector[0].text, 'Untitled') # Returns first element in title_selector if there is one; if IndexError returns 'Untitled'

that is where the IndexError will be raised before it gets to the function as it is trying to index the list

You could check if the list is empty

if not title_selector:
    title = 'Untitled'
else:
    title = whatever_you_want_to_do_instead

RE: Custom method to handle exceptions not working as expected - gradlon93 - Dec-22-2022

(Dec-22-2022, 06:13 PM)Yoriz Wrote: This won't work because in the line
title = get_element(title_selector[0].text, 'Untitled') # Returns first element in title_selector if there is one; if IndexError returns 'Untitled'
that is where the IndexError will be raised before it gets to the function as it is trying to index the list

You could check if the list is empty
if not title_selector:
    title = 'Untitled'
else:
    title = whatever_you_want_to_do_instead

If I got you right, Python throws the IndexError exception in the exact moment it tries to get the argument element of the function. So it doesn't even get inside the function, it just doesn't get "initialised" (perhaps not the best term, but just to express the idea...). That makes sense, thank you for your insight.

I changed my method as follows, and now it works.

def get_element(element: list, idx: int, idx_error):
    try:
        element[idx]
        
    except IndexError:
        warnings.warn(f'Element not found on page.')
        if inspect.isfunction(idx_error):
            idx_error()
            
        else:
            return idx_error
    else:
        return element[idx]

RE: Custom method to handle exceptions not working as expected - deanhystad - Dec-22-2022

I don't understand why you would use indexing with find_all(). I would expect code to iterate over the find_all() results, and an index error would not be possible.

Can you post an example of code where indexing is required? In your posted example don't see why you didn't use find() instead of find_all().

But if you did need to use find_all() you could do it like this.

if title_selector := content.findAll('h1', attrs={"itemprop": "title"})
    title = title_selector[0].text