Dec-22-2022, 05:32 PM
Good afternoon,
I am working on a web scraping project using BeautifulSoup and, on occasion, Selenium webdriver.
The method .findAll(...) by BeautifulSoup always returns a list, but its content is not certain and might be also empty.
I thought to write a custom method to handle IndexError without using a try/except every time I need to scrape something.
Basically, I want to return value_ ONLY if exists; if the operation raises an IndexError exception, then I want the
These data will be then used for a GUI, therefore I would rather get a non-significant value (ex:
Example usage:
Which should be the equivalent of this:
Unfortunately, for some reason I can't figure out, it just doesn't do anything, and the IndexError is thrown anyways.
Any ideas?
I am working on a web scraping project using BeautifulSoup and, on occasion, Selenium webdriver.
The method .findAll(...) by BeautifulSoup always returns a list, but its content is not certain and might be also empty.
I thought to write a custom method to handle IndexError without using a try/except every time I need to scrape something.
1 2 3 4 5 6 7 8 9 10 11 |
def get_element(element, idx_error = None ) - > Any : # TODO: This method does not work, idx error is not handled. FIX IT AS SOON AS POSSIBLE!!! value_ = None try : value_ = element except IndexError: if inspect.isfunction(idx_error): idx_error() else : return idx_error |
idx_error
value to be returned, or run in case it is a function/method.These data will be then used for a GUI, therefore I would rather get a non-significant value (ex:
'untitled'
) rather than a blocking error.Example usage:
1 2 |
title_selector = content.findAll( 'h1' , attrs = { "itemprop" : "title" }) # Returns a list of page elements title = get_element(title_selector[ 0 ].text, 'Untitled' ) # Returns first element in title_selector if there is one; if IndexError returns 'Untitled' |
1 2 3 4 5 6 |
title_selector = content.findAll( 'h1' , attrs = { "itemprop" : "title" }) title = None try : title = title_selector[ 0 ].text except IndexError: title = 'Untitled' |
Any ideas?