Dec-22-2022, 05:32 PM
Good afternoon,
I am working on a web scraping project using BeautifulSoup and, on occasion, Selenium webdriver.
The method .findAll(...) by BeautifulSoup always returns a list, but its content is not certain and might be also empty.
I thought to write a custom method to handle IndexError without using a try/except every time I need to scrape something.
These data will be then used for a GUI, therefore I would rather get a non-significant value (ex:
Example usage:
Any ideas?
I am working on a web scraping project using BeautifulSoup and, on occasion, Selenium webdriver.
The method .findAll(...) by BeautifulSoup always returns a list, but its content is not certain and might be also empty.
I thought to write a custom method to handle IndexError without using a try/except every time I need to scrape something.
def get_element(element, idx_error=None) -> Any: # TODO: This method does not work, idx error is not handled. FIX IT AS SOON AS POSSIBLE!!! value_ = None try: value_ = element except IndexError: if inspect.isfunction(idx_error): idx_error() else: return idx_errorBasically, I want to return value_ ONLY if exists; if the operation raises an IndexError exception, then I want the
idx_error
value to be returned, or run in case it is a function/method.These data will be then used for a GUI, therefore I would rather get a non-significant value (ex:
'untitled'
) rather than a blocking error.Example usage:
title_selector = content.findAll('h1', attrs={"itemprop": "title"}) # Returns a list of page elements title = get_element(title_selector[0].text, 'Untitled') # Returns first element in title_selector if there is one; if IndexError returns 'Untitled'Which should be the equivalent of this:
title_selector = content.findAll('h1', attrs={"itemprop": "title"}) title = None try: title = title_selector[0].text except IndexError: title = 'Untitled'Unfortunately, for some reason I can't figure out, it just doesn't do anything, and the IndexError is thrown anyways.
Any ideas?