Python Forum
Getting 'list index out of range' while fetching product details using BeautifulSoup?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Getting 'list index out of range' while fetching product details using BeautifulSoup?
#1
Hi All,

I have written following functions in Python3.6 for fetching the product related details from the site-

def get_soup(url):
    soup = None
    try:
        response = requests.get(url)
        if response.status_code == 200:
            html = response.content
            soup = BeautifulSoup(html, "html.parser")
    except Exception as exc:
        print("Unable to fecth data due to..", str(exc))
    finally:
        return soup
def get_product_details(url):
    soup = get_soup(url)
    sleep(1)
    try:
        product_shop = soup.find('div', attrs={"class": "buy"})
        if product_shop is not None:
            available_product_shop = soup.findAll('div')[2].find('span').text == "In Stock"
            if available_product_shop is not None:
                prod_details = dict()
                merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3].text
                if merchant_product_id is not None:
                    prod_details['merchant_product_id'] = merchant_product_id
                    check_title = soup.find('header', attrs={'class': 'product-name'}).find('h1')
                    if check_title is not None:
                        prod_details['title'] = check_title.text
                    check_description = soup.find('div', attrs={'id': 'tab-description'})
                    if check_description is not None:
                        prod_details['description'] = clean_description(check_description)
                    check_brand = soup.find('div', attrs={'class': 'description'}).findAll('span')[2].find('a')
                    if check_brand is not None:
                        prod_details['brand'] = check_brand.text
                    prod_details['google_product_category'] = CATEGORY_ID
                    prod_details['web_url'] = url
                    prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
                                                                                         soup.find('div', attrs={
                                                                                             'class': 'left'}).findAll(
                                                                                             'a')))))
                    check_price = soup.find('span', attrs={"class": "price-old"})
                    if check_price is not None:
                        prod_details['price'] = check_price.text.replace("SGD $", "")
                    check_sale_price = soup.find('span', attrs={"class": "price-new"})
                    if check_sale_price is not None:
                        prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "")
                    return prod_details
    except Exception as exc:
        print("Error..", str(exc))
def get_all_products(url):
    prod_urls = []
    soup = get_soup(url)
    prod_urls.append(get_product_urls(soup))

    links = get_pagination(soup)
    if not links:
        return prod_urls

    for link in links:
        soup = get_soup(link)
        prod_urls.append(get_product_urls(soup))

    print("Found following product urls:", prod_urls)
    return prod_urls

def get_product_urls(soup):
    links = soup.select('div.product-list .span .name a')
    if links is not None:
        return [link['href'] for link in links]

def get_pagination(soup):
    pages = soup.select('div.pagination div.links a')
    if pages is not None:
        return [link['href'] for link in pages if link.string.isdecimal()]

def get_category_urls(url):
    soup = get_soup(url)
    cat_urls = []
    try:
        categories = soup.find('div', attrs={'id': 'menu_oc'})
        if categories is not None:
            for c in categories.findAll('a'):
                if c['href'] is not None:
                    cat_urls.append(c['href'])
    except Exception as exc:
        print("Unable to fetch category urls due to..", str(exc))
    finally:
        print("Found following category urls::", cat_urls)
        return cat_urls
def flatten(items):
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

if __name__ == '__main__':
    category_urls = get_category_urls(URL)

    with Pool(20) as p:
        product_urls = p.map(get_all_products, category_urls)
    #product_urls = list(filter(None, product_urls))
    product_urls_flat =list(flatten(product_urls))

    with Pool(20) as p:
         products = p.map(get_product_details, product_urls_flat)
    products = list(filter(None, products))
    products_df = pd.DataFrame(products)
    print(products_df.head())
When I run the above code I got following issues-
1. 'list index out of range' in get_product_details() function.
2. Not getting correct value of Brand, Image urls, product id

Can anyone please run my code and share me the correct version of it?
Reply
#2
I have share the entire code so that you can run in your machine and find out what I am doing wrong.
Reply
#3
You are the one that needs help. Help us to help you. I don't want to run your code to find the error myself. If not for anything else, I may have different setup... You must provide the full traceback in error tags...
Also URL is not defined in your code, so I cannot run it, even if I am willing to do so. Which I am not....
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#4
Code-
def get_product_details(url):
    soup = get_soup(url)
    sleep(1)
    try:
        product_shop = soup.find('div', attrs={"class": "buy"})
        if product_shop is not None:
            available_product_shop = soup.findAll('div')[2].find('span').text == "In Stock"
            if available_product_shop is not None:
                prod_details = dict()
                check_merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3]
                if check_merchant_product_id is not None:
                    merchant_product_id = check_merchant_product_id.text
                if merchant_product_id is not None:
                    prod_details['merchant_product_id'] = merchant_product_id
                    check_title = soup.find('header', attrs={'class': 'product-name'}).find('h1')
                    if check_title is not None:
                        prod_details['title'] = check_title.text
                    check_description = soup.find('div', attrs={'id': 'tab-description'})
                    if check_description is not None:
                        prod_details['description'] = clean_description(check_description)
                    check_brand = soup.find('div', attrs={'class': 'description'}).find('a')
                    if check_brand is not None:
                        prod_details['brand'] = check_brand.text
                    prod_details['google_product_category'] = CATEGORY_ID
                    prod_details['status'] = 'A'
                    prod_details['web_url'] = url
                    prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
                                                                                         soup.find('div', attrs={
                                                                                             'class': 'left'}).findAll(
                                                                                             'a')))))
                    print("images::::",prod_details['merchant_image_urls'])
                    check_price = soup.find('span', attrs={"class": "price-old"})
                    if check_price is not None:
                        prod_details['price'] = check_price.text.replace("SGD $", "")
                    check_sale_price = soup.find('span', attrs={"class": "price-new"})
                    if check_sale_price is not None:
                        prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "")

                    return prod_details

    except requests.ConnectionError as cexc:
        print("Connection Error! Make sure you are connected to Internet.\n", str(cexc))
    except requests.Timeout as texc:
        print("Timeout Error.\n", str(texc))
    except requests.RequestException as rexc:
        print("following Request Exception occured.\n",str(rexc))
    except Exception as exc:
        print("Unable to fetch product details due to..", str(exc))
Output-
Output:
Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Unable to fetch product details due to.. list index out of range Finished extracting data, the process took 0:06:54.281451 brand description google_product_category merchant_image_urls merchant_product_id price sale_price status title web_url 0 Clearance <article><p>An atmosphere of enchanted stars and distant galaxies soothes baby in the cot thanks to the innovative system of double projection. Th... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC024421_Magic Star Cot Mobile - Pink(1) (Copy)-500x500.jpg,http://www.infantree.ne... Product Code: 119.00 79.00 A CHICCO Magic Star Cot Mobile - Pink http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=450 1 Clearance <article><p>Chicco Sweet Dream Musical Cot Mobile has a rotational motor that gently spins while playing a musical melody. The mobile has three cu... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC917298_CHICCO Natural Colors Cot Mobile - Sweet Dreams(1) (Copy)-500x500.jpg,http... Product Code: 79.00 47.40 A CHICCO Natural Colors Cot Mobile - Sweet Dreams http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=452 2 The First Years <article><p>Product Description<br> Research has shown the huge importance of ensuring babies are correctly positioned when put down to sleep. In ... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/Airflow/Air flow sleep positioner 5 3inch-500x500.jpg,http://www.infantree.net/shop/i... Product Code: NaN NaN A THE FIRST YEARS Air-Flow Sleep Positioner 5 Inch http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=285 3 The First Years <article><p>Research has shown the huge importance of ensuring babies are correctly positioned when put down to sleep. In fact, the danger of Sudd... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/Airflow/airflow posotioner w wedge3-500x500.jpg,http://www.infantree.net/shop/image/ca... Product Code: NaN NaN A THE FIRST YEARS Airflow Positioner with Wedge http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=284 4 The First Years <article><p>The First Years deceptively simple Close &amp; Secure Sleeper allows you to feed, soothe, monitor, and bond with baby in the comfort o... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3171-1-500x500.jpg,http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3... Product Code: NaN NaN A THE FIRST YEARS Close & Secure Sleeper http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=287 5 The First Years <article><p></p><p>Enjoy having everything you need to care for and groom your baby, all in 1 convenient storage case.</p><p>The Deluxe Baby Groom... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/tfy7056_2 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/TFY Produc... Product Code: NaN NaN A THE FIRST YEARS Deluxe Grooming Kit http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=908 6 The First Years Disney <article><p>This soft and soothing plush toy comes with melodies and a mother's fetal sound to help relax and lull your baby into a deep and peace... Infantree http://www.infantree.net/shop/image/cache/data/Tomy Disney/456933_2-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Tomy Disney/456933_... Product Code: 79.00 59.90 A Tomy Disney Suya Suya Melody Baby Mickey http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=946 7 The First Years Disney <article><p>This soft and soothing plush toy comes with melodies and a mother's fetal sound to help relax and lull your baby into a deep and peace... Infantree http://www.infantree.net/shop/image/cache/data/Tomy Disney/TD456940-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Tomy Disney/TD45694... Product Code: 79.00 59.90 A Tomy Disney Suya Suya Melody Baby Minnie http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=1122 8 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag &amp; Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Blue (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=857 9 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag &amp; Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=859 10 Zibos <article><h2><u>Zibos Ama Bedside Crib - Grey (With Travel Bag &amp; Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Grey (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Grey (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=1133 11 Zibos <article><h2><u>Zibos Ama Bedside Crib - Sand (With Travel Bag &amp; Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Sand (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=1134 12 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag &amp; Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Blue (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=857 13 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag &amp; Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=859 14 Zibos <article><h2><u>Zibos Ama Bedside Crib - Grey (With Travel Bag &amp; Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Grey (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Grey (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=1133 15 Zibos <article><h2><u>Zibos Ama Bedside Crib - Sand (With Travel Bag &amp; Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Sand (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=1134 16 The First Years <article><p>The First Years deceptively simple Close &amp; Secure Sleeper allows you to feed, soothe, monitor, and bond with baby in the comfort o... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3171-1-500x500.jpg,http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3... Product Code: NaN NaN A THE FIRST YEARS Close & Secure Sleeper http://www.infantree.net/shop/index.php?route=product/product&path=71_74&product_id=287 17 Clearance <article><p>An atmosphere of enchanted stars and distant galaxies soothes baby in the cot thanks to the innovative system of double projection. Th... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC024421_Magic Star Cot Mobile - Pink(1) (Copy)-500x500.jpg,http://www.infantree.ne... Product Code: 119.00 79.00 A CHICCO Magic Star Cot Mobile - Pink http://www.infantree.net/shop/index.php?route=product/product&path=71_91&product_id=450 18 Clearance <article><p>Chicco Sweet Dream Musical Cot Mobile has a rotational motor that gently spins while playing a musical melody. The mobile has three cu... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC917298_CHICCO Natural Colors Cot Mobile - Sweet Dreams(1) (Copy)-500x500.jpg,http... Product Code: 79.00 47.40 A CHICCO Natural Colors Cot Mobile - Sweet Dreams http://www.infantree.net/shop/index.php?route=product/product&path=71_91&product_id=452 19 The First Years Disney <article><p>The one and only musical mobile for all cots, bedside cribs, playpens, etc. You don't have to worry if the mobile can be attached to t... Infantree http://www.infantree.net/shop/image/cache/data/Tomy Disney/429579-500x500.jpeg,http://www.infantree.net/shop/image/cache/data/Tomy Disney/429579-5... Product Code: NaN NaN A Tomy Disney Home Theatre (Disney Character) http://www.infantree.net/shop/index.php?route=product/product&path=71_91&product_id=1056 Process finished with exit code 0
Reply
#5
(Jun-06-2018, 04:53 AM)PrateekG Wrote: 1. 'list index out of range' in get_product_details() function.
I don't see this error in your output.... When you are not able to debug, don't mask the full traceback...
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#6
First Issue-
check_merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3]
                if check_merchant_product_id is not None:
                    merchant_product_id = check_merchant_product_id.text
                if merchant_product_id is not None:
                    prod_details['merchant_product_id'] = merchant_product_id
Error:
getting 'Product Code:' instead of value
Second Issue-
prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
                                                                                         soup.find('div', attrs={
                                                                                             'class': 'left'}).findAll(
                                                                                             'a')))))
Error:
getting broken image urls
Third Issue-
check_price = soup.find('span', attrs={"class": "price-old"})
                    if check_price is not None:
                        prod_details['price'] = check_price.text.replace("SGD $", "")
                    check_sale_price = soup.find('span', attrs={"class": "price-new"})
                    if check_sale_price is not None:
                        prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "")
Error:
getting NaN for some products
Reply
#7
I give up... Sorry, but you are hopeless case...
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#8
(Jun-06-2018, 09:29 AM)buran Wrote: You are the one that needs help. Help us to help you. I don't want to run your code to find the error myself. If not for anything else, I may have different setup... You must provide the full traceback in error tags...
Also URL is not defined in your code, so I cannot run it, even if I am willing to do so. Which I am not....

I have post my code, logs and issues.
Hoping now you can help me!

It seems you are not interested in helping rather creating issues.

(Jun-06-2018, 10:21 AM)buran Wrote:
(Jun-06-2018, 04:53 AM)PrateekG Wrote: 1. 'list index out of range' in get_product_details() function.
I don't see this error in your output.... When you are not able to debug, don't mask the full traceback...

Just try to see first line of the output instead of creating issue.
It clearly says- list index out of range
Reply
#9
The whole thing is messy as pointed out bye @buran.
It's not code that we can run,missing imports and call to function.
attrs={"class": "buy"}
Do you find a class='buy' on that site?

Just to show a quick test,with code that can be run(no missing parts).
Take out name of featured product,class_='price' will take out price.
import requests
from bs4 import BeautifulSoup

url = 'http://www.infantree.net/shop/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
#print(soup)
featured = soup.find('section', class_="featured span")
li = featured.find_all('li')
for item in li:
    print(item.find('div', class_='name').text.strip())
Output:
THE FIRST YEARS Bottle Warmer and Cooler THE FIRST YEARS Steam Sterilizer AMEDA Store'N Pour Breast Milk Storage Bags AMEDA Lactaline Personal Breastpump LASCAL Kiddy Guard Accent - Black
Do small test like this first to make sure that all work.
If get error in this short code,it's easy to see what's wrong.
Then can structure in functions(not to much code in one function).
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Fetching Images from DB in Django Dexty 2 1,623 Mar-15-2024, 08:43 AM
Last Post: firn100
  All product links to products on a website MarionStorm 0 1,056 Jun-02-2022, 11:17 PM
Last Post: MarionStorm
  IndexError: list index out of range" & "TypeError: The view function f: Flask Web App joelbeater992 5 3,455 Aug-31-2021, 08:08 PM
Last Post: joelbeater992
  Python BeautifulSoup IndexError: list index out of range rhat398 1 6,163 May-28-2021, 09:09 PM
Last Post: Daring_T
  fetching, parsing data from Wikipedia apollo 2 3,503 May-06-2021, 08:08 PM
Last Post: snippsat
  How to make data coming from a database clickable giving more details newbie1 8 3,672 May-29-2020, 11:19 PM
Last Post: newbie1
  IndexError: tuple index out of range ? JohnnyCoffee 4 3,354 Jan-22-2020, 06:54 AM
Last Post: JohnnyCoffee
  Fetching and Parsing XML Data FalseFact 3 3,200 Apr-01-2019, 10:21 AM
Last Post: Larz60+
  My Django 2.0.6 logging is not working while product merging PrateekG 0 2,112 Jul-26-2018, 02:24 PM
Last Post: PrateekG
  from List to BeautifulSoup , Homework RPC 6 6,931 Jul-03-2018, 12:17 AM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020