Python Forum
Getting 'list index out of range' while fetching product details using BeautifulSoup? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/Forum-Python-Coding)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/Forum-Web-Scraping-Web-Development)
+--- Thread: Getting 'list index out of range' while fetching product details using BeautifulSoup? (/Thread-Getting-list-index-out-of-range-while-fetching-product-details-using-BeautifulSoup)



Getting 'list index out of range' while fetching product details using BeautifulSoup? - PrateekG - Jun-06-2018

Hi All,

I have written following functions in Python3.6 for fetching the product related details from the site-

def get_soup(url):
    soup = None
    try:
        response = requests.get(url)
        if response.status_code == 200:
            html = response.content
            soup = BeautifulSoup(html, "html.parser")
    except Exception as exc:
        print("Unable to fecth data due to..", str(exc))
    finally:
        return soup
def get_product_details(url):
    soup = get_soup(url)
    sleep(1)
    try:
        product_shop = soup.find('div', attrs={"class": "buy"})
        if product_shop is not None:
            available_product_shop = soup.findAll('div')[2].find('span').text == "In Stock"
            if available_product_shop is not None:
                prod_details = dict()
                merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3].text
                if merchant_product_id is not None:
                    prod_details['merchant_product_id'] = merchant_product_id
                    check_title = soup.find('header', attrs={'class': 'product-name'}).find('h1')
                    if check_title is not None:
                        prod_details['title'] = check_title.text
                    check_description = soup.find('div', attrs={'id': 'tab-description'})
                    if check_description is not None:
                        prod_details['description'] = clean_description(check_description)
                    check_brand = soup.find('div', attrs={'class': 'description'}).findAll('span')[2].find('a')
                    if check_brand is not None:
                        prod_details['brand'] = check_brand.text
                    prod_details['google_product_category'] = CATEGORY_ID
                    prod_details['web_url'] = url
                    prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
                                                                                         soup.find('div', attrs={
                                                                                             'class': 'left'}).findAll(
                                                                                             'a')))))
                    check_price = soup.find('span', attrs={"class": "price-old"})
                    if check_price is not None:
                        prod_details['price'] = check_price.text.replace("SGD $", "")
                    check_sale_price = soup.find('span', attrs={"class": "price-new"})
                    if check_sale_price is not None:
                        prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "")
                    return prod_details
    except Exception as exc:
        print("Error..", str(exc))
def get_all_products(url):
    prod_urls = []
    soup = get_soup(url)
    prod_urls.append(get_product_urls(soup))

    links = get_pagination(soup)
    if not links:
        return prod_urls

    for link in links:
        soup = get_soup(link)
        prod_urls.append(get_product_urls(soup))

    print("Found following product urls:", prod_urls)
    return prod_urls

def get_product_urls(soup):
    links = soup.select('div.product-list .span .name a')
    if links is not None:
        return [link['href'] for link in links]

def get_pagination(soup):
    pages = soup.select('div.pagination div.links a')
    if pages is not None:
        return [link['href'] for link in pages if link.string.isdecimal()]

def get_category_urls(url):
    soup = get_soup(url)
    cat_urls = []
    try:
        categories = soup.find('div', attrs={'id': 'menu_oc'})
        if categories is not None:
            for c in categories.findAll('a'):
                if c['href'] is not None:
                    cat_urls.append(c['href'])
    except Exception as exc:
        print("Unable to fetch category urls due to..", str(exc))
    finally:
        print("Found following category urls::", cat_urls)
        return cat_urls
def flatten(items):
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

if __name__ == '__main__':
    category_urls = get_category_urls(URL)

    with Pool(20) as p:
        product_urls = p.map(get_all_products, category_urls)
    #product_urls = list(filter(None, product_urls))
    product_urls_flat =list(flatten(product_urls))

    with Pool(20) as p:
         products = p.map(get_product_details, product_urls_flat)
    products = list(filter(None, products))
    products_df = pd.DataFrame(products)
    print(products_df.head())
When I run the above code I got following issues-
1. 'list index out of range' in get_product_details() function.
2. Not getting correct value of Brand, Image urls, product id

Can anyone please run my code and share me the correct version of it?


RE: Getting 'list index out of range' while fetching product details using BeautifulSoup? - PrateekG - Jun-06-2018

I have share the entire code so that you can run in your machine and find out what I am doing wrong.


RE: Getting 'list index out of range' while fetching product details using BeautifulSoup? - buran - Jun-06-2018

You are the one that needs help. Help us to help you. I don't want to run your code to find the error myself. If not for anything else, I may have different setup... You must provide the full traceback in error tags...
Also URL is not defined in your code, so I cannot run it, even if I am willing to do so. Which I am not....


RE: Getting 'list index out of range' while fetching product details using BeautifulSoup? - PrateekG - Jun-06-2018

Code-
def get_product_details(url):
    soup = get_soup(url)
    sleep(1)
    try:
        product_shop = soup.find('div', attrs={"class": "buy"})
        if product_shop is not None:
            available_product_shop = soup.findAll('div')[2].find('span').text == "In Stock"
            if available_product_shop is not None:
                prod_details = dict()
                check_merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3]
                if check_merchant_product_id is not None:
                    merchant_product_id = check_merchant_product_id.text
                if merchant_product_id is not None:
                    prod_details['merchant_product_id'] = merchant_product_id
                    check_title = soup.find('header', attrs={'class': 'product-name'}).find('h1')
                    if check_title is not None:
                        prod_details['title'] = check_title.text
                    check_description = soup.find('div', attrs={'id': 'tab-description'})
                    if check_description is not None:
                        prod_details['description'] = clean_description(check_description)
                    check_brand = soup.find('div', attrs={'class': 'description'}).find('a')
                    if check_brand is not None:
                        prod_details['brand'] = check_brand.text
                    prod_details['google_product_category'] = CATEGORY_ID
                    prod_details['status'] = 'A'
                    prod_details['web_url'] = url
                    prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
                                                                                         soup.find('div', attrs={
                                                                                             'class': 'left'}).findAll(
                                                                                             'a')))))
                    print("images::::",prod_details['merchant_image_urls'])
                    check_price = soup.find('span', attrs={"class": "price-old"})
                    if check_price is not None:
                        prod_details['price'] = check_price.text.replace("SGD $", "")
                    check_sale_price = soup.find('span', attrs={"class": "price-new"})
                    if check_sale_price is not None:
                        prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "")

                    return prod_details

    except requests.ConnectionError as cexc:
        print("Connection Error! Make sure you are connected to Internet.\n", str(cexc))
    except requests.Timeout as texc:
        print("Timeout Error.\n", str(texc))
    except requests.RequestException as rexc:
        print("following Request Exception occured.\n",str(rexc))
    except Exception as exc:
        print("Unable to fetch product details due to..", str(exc))
Output-
Output:
Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Finished extracting data, the process took 0:06:54.281451 brand description google_product_category merchant_image_urls merchant_product_id price sale_price status title web_url 0 Clearance <article><p>An atmosphere of enchanted stars and distant galaxies soothes baby in the cot thanks to the innovative system of double projection. Th... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC024421_Magic Star Cot Mobile - Pink(1) (Copy)-500x500.jpg,http://www.infantree.ne... Product Code: 119.00 79.00 A CHICCO Magic Star Cot Mobile - Pink http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=450 1 Clearance <article><p>Chicco Sweet Dream Musical Cot Mobile has a rotational motor that gently spins while playing a musical melody. The mobile has three cu... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC917298_CHICCO Natural Colors Cot Mobile - Sweet Dreams(1) (Copy)-500x500.jpg,http... Product Code: 79.00 47.40 A CHICCO Natural Colors Cot Mobile - Sweet Dreams http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=452 2 The First Years <article><p>Product Description<br> Research has shown the huge importance of ensuring babies are correctly positioned when put down to sleep. In ... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/Airflow/Air flow sleep positioner 5 3inch-500x500.jpg,http://www.infantree.net/shop/i... Product Code: NaN NaN A THE FIRST YEARS Air-Flow Sleep Positioner 5 Inch http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=285 3 The First Years <article><p>Research has shown the huge importance of ensuring babies are correctly positioned when put down to sleep. In fact, the danger of Sudd... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/Airflow/airflow posotioner w wedge3-500x500.jpg,http://www.infantree.net/shop/image/ca... Product Code: NaN NaN A THE FIRST YEARS Airflow Positioner with Wedge http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=284 4 The First Years <article><p>The First Years deceptively simple Close &amp; Secure Sleeper allows you to feed, soothe, monitor, and bond with baby in the comfort o... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3171-1-500x500.jpg,http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3... Product Code: NaN NaN A THE FIRST YEARS Close & Secure Sleeper http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=287 5 The First Years <article><p></p><p>Enjoy having everything you need to care for and groom your baby, all in 1 convenient storage case.</p><p>The Deluxe Baby Groom... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/tfy7056_2 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/TFY Produc... Product Code: NaN NaN A THE FIRST YEARS Deluxe Grooming Kit http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=908 6 The First Years Disney <article><p>This soft and soothing plush toy comes with melodies and a mother's fetal sound to help relax and lull your baby into a deep and peace... Infantree http://www.infantree.net/shop/image/cache/data/Tomy Disney/456933_2-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Tomy Disney/456933_... Product Code: 79.00 59.90 A Tomy Disney Suya Suya Melody Baby Mickey http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=946 7 The First Years Disney <article><p>This soft and soothing plush toy comes with melodies and a mother's fetal sound to help relax and lull your baby into a deep and peace... Infantree http://www.infantree.net/shop/image/cache/data/Tomy Disney/TD456940-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Tomy Disney/TD45694... Product Code: 79.00 59.90 A Tomy Disney Suya Suya Melody Baby Minnie http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=1122 8 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag &amp; Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Blue (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=857 9 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag &amp; Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=859 10 Zibos <article><h2><u>Zibos Ama Bedside Crib - Grey (With Travel Bag &amp; Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Grey (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Grey (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=1133 11 Zibos <article><h2><u>Zibos Ama Bedside Crib - Sand (With Travel Bag &amp; Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Sand (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=1134 12 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag &amp; Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Blue (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=857 13 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag &amp; Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=859 14 Zibos <article><h2><u>Zibos Ama Bedside Crib - Grey (With Travel Bag &amp; Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Grey (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Grey (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=1133 15 Zibos <article><h2><u>Zibos Ama Bedside Crib - Sand (With Travel Bag &amp; Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Sand (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=1134 16 The First Years <article><p>The First Years deceptively simple Close &amp; Secure Sleeper allows you to feed, soothe, monitor, and bond with baby in the comfort o... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3171-1-500x500.jpg,http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3... Product Code: NaN NaN A THE FIRST YEARS Close & Secure Sleeper http://www.infantree.net/shop/index.php?route=product/product&path=71_74&product_id=287 17 Clearance <article><p>An atmosphere of enchanted stars and distant galaxies soothes baby in the cot thanks to the innovative system of double projection. Th... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC024421_Magic Star Cot Mobile - Pink(1) (Copy)-500x500.jpg,http://www.infantree.ne... Product Code: 119.00 79.00 A CHICCO Magic Star Cot Mobile - Pink http://www.infantree.net/shop/index.php?route=product/product&path=71_91&product_id=450 18 Clearance <article><p>Chicco Sweet Dream Musical Cot Mobile has a rotational motor that gently spins while playing a musical melody. The mobile has three cu... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC917298_CHICCO Natural Colors Cot Mobile - Sweet Dreams(1) (Copy)-500x500.jpg,http... Product Code: 79.00 47.40 A CHICCO Natural Colors Cot Mobile - Sweet Dreams http://www.infantree.net/shop/index.php?route=product/product&path=71_91&product_id=452 19 The First Years Disney <article><p>The one and only musical mobile for all cots, bedside cribs, playpens, etc. You don't have to worry if the mobile can be attached to t... Infantree http://www.infantree.net/shop/image/cache/data/Tomy Disney/429579-500x500.jpeg,http://www.infantree.net/shop/image/cache/data/Tomy Disney/429579-5... Product Code: NaN NaN A Tomy Disney Home Theatre (Disney Character) http://www.infantree.net/shop/index.php?route=product/product&path=71_91&product_id=1056 Process finished with exit code 0



RE: Getting 'list index out of range' while fetching product details using BeautifulSoup? - buran - Jun-06-2018

(Jun-06-2018, 04:53 AM)PrateekG Wrote: 1. 'list index out of range' in get_product_details() function.
I don't see this error in your output.... When you are not able to debug, don't mask the full traceback...


RE: Getting 'list index out of range' while fetching product details using BeautifulSoup? - PrateekG - Jun-06-2018

First Issue-
check_merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3]
                if check_merchant_product_id is not None:
                    merchant_product_id = check_merchant_product_id.text
                if merchant_product_id is not None:
                    prod_details['merchant_product_id'] = merchant_product_id
Error:
getting 'Product Code:' instead of value
Second Issue-
prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
                                                                                         soup.find('div', attrs={
                                                                                             'class': 'left'}).findAll(
                                                                                             'a')))))
Error:
getting broken image urls
Third Issue-
check_price = soup.find('span', attrs={"class": "price-old"})
                    if check_price is not None:
                        prod_details['price'] = check_price.text.replace("SGD $", "")
                    check_sale_price = soup.find('span', attrs={"class": "price-new"})
                    if check_sale_price is not None:
                        prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "")
Error:
getting NaN for some products



RE: Getting 'list index out of range' while fetching product details using BeautifulSoup? - buran - Jun-06-2018

I give up... Sorry, but you are hopeless case...


RE: Getting 'list index out of range' while fetching product details using BeautifulSoup? - PrateekG - Jun-06-2018

(Jun-06-2018, 09:29 AM)buran Wrote: You are the one that needs help. Help us to help you. I don't want to run your code to find the error myself. If not for anything else, I may have different setup... You must provide the full traceback in error tags...
Also URL is not defined in your code, so I cannot run it, even if I am willing to do so. Which I am not....

I have post my code, logs and issues.
Hoping now you can help me!

It seems you are not interested in helping rather creating issues.

(Jun-06-2018, 10:21 AM)buran Wrote:
(Jun-06-2018, 04:53 AM)PrateekG Wrote: 1. 'list index out of range' in get_product_details() function.
I don't see this error in your output.... When you are not able to debug, don't mask the full traceback...

Just try to see first line of the output instead of creating issue.
It clearly says- list index out of range


RE: Getting 'list index out of range' while fetching product details using BeautifulSoup? - snippsat - Jun-06-2018

The whole thing is messy as pointed out bye @buran.
It's not code that we can run,missing imports and call to function.
attrs={"class": "buy"}
Do you find a class='buy' on that site?

Just to show a quick test,with code that can be run(no missing parts).
Take out name of featured product,class_='price' will take out price.
import requests
from bs4 import BeautifulSoup

url = 'http://www.infantree.net/shop/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
#print(soup)
featured = soup.find('section', class_="featured span")
li = featured.find_all('li')
for item in li:
    print(item.find('div', class_='name').text.strip())
Output:
THE FIRST YEARS Bottle Warmer and Cooler THE FIRST YEARS Steam Sterilizer AMEDA Store'N Pour Breast Milk Storage Bags AMEDA Lactaline Personal Breastpump LASCAL Kiddy Guard Accent - Black
Do small test like this first to make sure that all work.
If get error in this short code,it's easy to see what's wrong.
Then can structure in functions(not to much code in one function).