Hi Expert,
I have fetched data from html using below code-
html data
How to handle pagination here since some categoroies have pagination and some have not?
Finally I got the issue.
I was not checking pagination for all categories and that's why getting problem.
Now I am able to solve the issue by putting a check for pagination.
I have fetched data from html using below code-
def get_soup(url): response = requests.get(url) html = response.content return BeautifulSoup(html, "html.parser")And I have fecthed catagory url with-
def get_category_urls(url): soup = get_soup(url) cat_urls = [] try: categories = soup.find('div', attrs={'id': 'menu_oc'}) if categories is not None: for c in categories.findAll('a'): if c['href'] is not None: cat_urls.append(c['href']) except Exception as exc: print("error::" + url + str(exc)) finally: return cat_urlsNow I am trying to fetch product urls with below code-
def get_product_urls(url): soup = get_soup(url) prod_urls = [] try: if soup.find('div', attrs={'class': 'pagination'}): pages = soup.find('div', attrs={'class': 'page'}).text.split("of ", 1)[1].replace(' (1 Pages)','') if pages is not None: for page in range(1, int(pages) + 1): soup_with_page = get_soup(url + "&page={}".format(page)) product_urls_soup = soup_with_page.find('div', attrs={'id': 'carousel-featured-0'}) if product_urls_soup is not None: for row in product_urls_soup.findAll('a'): if row['href'] is not None: prod_urls.append(row['href']) except Exception as exc: print("error:: " + prod_urls + ": " + str(exc)) finally: return prod_urls
if __name__ == '__main__': with Pool(2) as p: product_urls = p.map(get_product_urls, category_urls) product_urls = list(filter(None, product_urls)) product_urls_flat = list(set([y for x in product_urls for y in x]))I am getting product_urls_soup as None here, what I am doing wrong here? PFB sample html data-
html data
How to handle pagination here since some categoroies have pagination and some have not?
Finally I got the issue.
I was not checking pagination for all categories and that's why getting problem.
Now I am able to solve the issue by putting a check for pagination.