Posts: 88
Threads: 33
Joined: Apr 2018
Jun-06-2018, 04:53 AM
(This post was last modified: Jun-06-2018, 06:54 AM by buran.)
Hi All,
I have written following functions in Python3.6 for fetching the product related details from the site-
def get_soup(url):
soup = None
try:
response = requests.get(url)
if response.status_code == 200:
html = response.content
soup = BeautifulSoup(html, "html.parser")
except Exception as exc:
print("Unable to fecth data due to..", str(exc))
finally:
return soup
def get_product_details(url):
soup = get_soup(url)
sleep(1)
try:
product_shop = soup.find('div', attrs={"class": "buy"})
if product_shop is not None:
available_product_shop = soup.findAll('div')[2].find('span').text == "In Stock"
if available_product_shop is not None:
prod_details = dict()
merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3].text
if merchant_product_id is not None:
prod_details['merchant_product_id'] = merchant_product_id
check_title = soup.find('header', attrs={'class': 'product-name'}).find('h1')
if check_title is not None:
prod_details['title'] = check_title.text
check_description = soup.find('div', attrs={'id': 'tab-description'})
if check_description is not None:
prod_details['description'] = clean_description(check_description)
check_brand = soup.find('div', attrs={'class': 'description'}).findAll('span')[2].find('a')
if check_brand is not None:
prod_details['brand'] = check_brand.text
prod_details['google_product_category'] = CATEGORY_ID
prod_details['web_url'] = url
prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
soup.find('div', attrs={
'class': 'left'}).findAll(
'a')))))
check_price = soup.find('span', attrs={"class": "price-old"})
if check_price is not None:
prod_details['price'] = check_price.text.replace("SGD $", "")
check_sale_price = soup.find('span', attrs={"class": "price-new"})
if check_sale_price is not None:
prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "")
return prod_details
except Exception as exc:
print("Error..", str(exc)) def get_all_products(url):
prod_urls = []
soup = get_soup(url)
prod_urls.append(get_product_urls(soup))
links = get_pagination(soup)
if not links:
return prod_urls
for link in links:
soup = get_soup(link)
prod_urls.append(get_product_urls(soup))
print("Found following product urls:", prod_urls)
return prod_urls
def get_product_urls(soup):
links = soup.select('div.product-list .span .name a')
if links is not None:
return [link['href'] for link in links]
def get_pagination(soup):
pages = soup.select('div.pagination div.links a')
if pages is not None:
return [link['href'] for link in pages if link.string.isdecimal()]
def get_category_urls(url):
soup = get_soup(url)
cat_urls = []
try:
categories = soup.find('div', attrs={'id': 'menu_oc'})
if categories is not None:
for c in categories.findAll('a'):
if c['href'] is not None:
cat_urls.append(c['href'])
except Exception as exc:
print("Unable to fetch category urls due to..", str(exc))
finally:
print("Found following category urls::", cat_urls)
return cat_urls def flatten(items):
for x in items:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x
if __name__ == '__main__':
category_urls = get_category_urls(URL)
with Pool(20) as p:
product_urls = p.map(get_all_products, category_urls)
#product_urls = list(filter(None, product_urls))
product_urls_flat =list(flatten(product_urls))
with Pool(20) as p:
products = p.map(get_product_details, product_urls_flat)
products = list(filter(None, products))
products_df = pd.DataFrame(products)
print(products_df.head()) When I run the above code I got following issues-
1. 'list index out of range' in get_product_details() function.
2. Not getting correct value of Brand, Image urls, product id
Can anyone please run my code and share me the correct version of it?
Posts: 88
Threads: 33
Joined: Apr 2018
I have share the entire code so that you can run in your machine and find out what I am doing wrong.
Posts: 8,160
Threads: 160
Joined: Sep 2016
You are the one that needs help. Help us to help you. I don't want to run your code to find the error myself. If not for anything else, I may have different setup... You must provide the full traceback in error tags...
Also URL is not defined in your code, so I cannot run it, even if I am willing to do so. Which I am not....
Posts: 88
Threads: 33
Joined: Apr 2018
Code-
def get_product_details(url):
soup = get_soup(url)
sleep(1)
try:
product_shop = soup.find('div', attrs={"class": "buy"})
if product_shop is not None:
available_product_shop = soup.findAll('div')[2].find('span').text == "In Stock"
if available_product_shop is not None:
prod_details = dict()
check_merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3]
if check_merchant_product_id is not None:
merchant_product_id = check_merchant_product_id.text
if merchant_product_id is not None:
prod_details['merchant_product_id'] = merchant_product_id
check_title = soup.find('header', attrs={'class': 'product-name'}).find('h1')
if check_title is not None:
prod_details['title'] = check_title.text
check_description = soup.find('div', attrs={'id': 'tab-description'})
if check_description is not None:
prod_details['description'] = clean_description(check_description)
check_brand = soup.find('div', attrs={'class': 'description'}).find('a')
if check_brand is not None:
prod_details['brand'] = check_brand.text
prod_details['google_product_category'] = CATEGORY_ID
prod_details['status'] = 'A'
prod_details['web_url'] = url
prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
soup.find('div', attrs={
'class': 'left'}).findAll(
'a')))))
print("images::::",prod_details['merchant_image_urls'])
check_price = soup.find('span', attrs={"class": "price-old"})
if check_price is not None:
prod_details['price'] = check_price.text.replace("SGD $", "")
check_sale_price = soup.find('span', attrs={"class": "price-new"})
if check_sale_price is not None:
prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "")
return prod_details
except requests.ConnectionError as cexc:
print("Connection Error! Make sure you are connected to Internet.\n", str(cexc))
except requests.Timeout as texc:
print("Timeout Error.\n", str(texc))
except requests.RequestException as rexc:
print("following Request Exception occured.\n",str(rexc))
except Exception as exc:
print("Unable to fetch product details due to..", str(exc)) Output-
Output: Unable to fetch product details due to.. list index out of range
Unable to fetch product details due to.. list index out of range
Unable to fetch product details due to.. list index out of range
Unable to fetch product details due to.. list index out of range
Unable to fetch product details due to.. list index out of range
Unable to fetch product details due to.. list index out of range
Unable to fetch product details due to.. list index out of range
Finished extracting data, the process took 0:06:54.281451
brand description google_product_category merchant_image_urls merchant_product_id price sale_price status title web_url
0 Clearance <article><p>An atmosphere of enchanted stars and distant galaxies soothes baby in the cot thanks to the innovative system of double projection. Th... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC024421_Magic Star Cot Mobile - Pink(1) (Copy)-500x500.jpg,http://www.infantree.ne... Product Code: 119.00 79.00 A CHICCO Magic Star Cot Mobile - Pink http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=450
1 Clearance <article><p>Chicco Sweet Dream Musical Cot Mobile has a rotational motor that gently spins while playing a musical melody. The mobile has three cu... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC917298_CHICCO Natural Colors Cot Mobile - Sweet Dreams(1) (Copy)-500x500.jpg,http... Product Code: 79.00 47.40 A CHICCO Natural Colors Cot Mobile - Sweet Dreams http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=452
2 The First Years <article><p>Product Description<br> Research has shown the huge importance of ensuring babies are correctly positioned when put down to sleep. In ... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/Airflow/Air flow sleep positioner 5 3inch-500x500.jpg,http://www.infantree.net/shop/i... Product Code: NaN NaN A THE FIRST YEARS Air-Flow Sleep Positioner 5 Inch http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=285
3 The First Years <article><p>Research has shown the huge importance of ensuring babies are correctly positioned when put down to sleep. In fact, the danger of Sudd... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/Airflow/airflow posotioner w wedge3-500x500.jpg,http://www.infantree.net/shop/image/ca... Product Code: NaN NaN A THE FIRST YEARS Airflow Positioner with Wedge http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=284
4 The First Years <article><p>The First Years deceptively simple Close & Secure Sleeper allows you to feed, soothe, monitor, and bond with baby in the comfort o... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3171-1-500x500.jpg,http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3... Product Code: NaN NaN A THE FIRST YEARS Close & Secure Sleeper http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=287
5 The First Years <article><p></p><p>Enjoy having everything you need to care for and groom your baby, all in 1 convenient storage case.</p><p>The Deluxe Baby Groom... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/tfy7056_2 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/TFY Produc... Product Code: NaN NaN A THE FIRST YEARS Deluxe Grooming Kit http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=908
6 The First Years Disney <article><p>This soft and soothing plush toy comes with melodies and a mother's fetal sound to help relax and lull your baby into a deep and peace... Infantree http://www.infantree.net/shop/image/cache/data/Tomy Disney/456933_2-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Tomy Disney/456933_... Product Code: 79.00 59.90 A Tomy Disney Suya Suya Melody Baby Mickey http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=946
7 The First Years Disney <article><p>This soft and soothing plush toy comes with melodies and a mother's fetal sound to help relax and lull your baby into a deep and peace... Infantree http://www.infantree.net/shop/image/cache/data/Tomy Disney/TD456940-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Tomy Disney/TD45694... Product Code: 79.00 59.90 A Tomy Disney Suya Suya Melody Baby Minnie http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=1122
8 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag & Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Blue (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=857
9 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag & Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=859
10 Zibos <article><h2><u>Zibos Ama Bedside Crib - Grey (With Travel Bag & Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Grey (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Grey (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=1133
11 Zibos <article><h2><u>Zibos Ama Bedside Crib - Sand (With Travel Bag & Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Sand (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71&product_id=1134
12 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag & Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/blue-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Blue (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=857
13 Zibos <article><p><img></p><p><strong>Zibos Ala Bedside Crib - Blue (With Travel Bag & Mosquito Net)</strong></p><ul><li>Suitable from birth - 6 mon... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos/sand-1 (Copy)-... Product Code: NaN NaN A Zibos Ala Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=859
14 Zibos <article><h2><u>Zibos Ama Bedside Crib - Grey (With Travel Bag & Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Grey (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Grey (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=1133
15 Zibos <article><h2><u>Zibos Ama Bedside Crib - Sand (With Travel Bag & Mosquito Net)</u></h2><ul><li>Suitable from birth - 6 months</li><li>Designed... Infantree http://www.infantree.net/shop/image/cache/data/Zibos/Ama - Britain Sand (rocking)-500x500.jpg,http://www.infantree.net/shop/image/cache/data/Zibos... Product Code: NaN NaN A Zibos Ama Bedside Crib - Sand (With Travel Bag & Mosquito Net) http://www.infantree.net/shop/index.php?route=product/product&path=71_73&product_id=1134
16 The First Years <article><p>The First Years deceptively simple Close & Secure Sleeper allows you to feed, soothe, monitor, and bond with baby in the comfort o... Infantree http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3171-1-500x500.jpg,http://www.infantree.net/shop/image/cache/data/TFY Products/TFY3... Product Code: NaN NaN A THE FIRST YEARS Close & Secure Sleeper http://www.infantree.net/shop/index.php?route=product/product&path=71_74&product_id=287
17 Clearance <article><p>An atmosphere of enchanted stars and distant galaxies soothes baby in the cot thanks to the innovative system of double projection. Th... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC024421_Magic Star Cot Mobile - Pink(1) (Copy)-500x500.jpg,http://www.infantree.ne... Product Code: 119.00 79.00 A CHICCO Magic Star Cot Mobile - Pink http://www.infantree.net/shop/index.php?route=product/product&path=71_91&product_id=450
18 Clearance <article><p>Chicco Sweet Dream Musical Cot Mobile has a rotational motor that gently spins while playing a musical melody. The mobile has three cu... Infantree http://www.infantree.net/shop/image/cache/data/Chicco Products/CC917298_CHICCO Natural Colors Cot Mobile - Sweet Dreams(1) (Copy)-500x500.jpg,http... Product Code: 79.00 47.40 A CHICCO Natural Colors Cot Mobile - Sweet Dreams http://www.infantree.net/shop/index.php?route=product/product&path=71_91&product_id=452
19 The First Years Disney <article><p>The one and only musical mobile for all cots, bedside cribs, playpens, etc. You don't have to worry if the mobile can be attached to t... Infantree http://www.infantree.net/shop/image/cache/data/Tomy Disney/429579-500x500.jpeg,http://www.infantree.net/shop/image/cache/data/Tomy Disney/429579-5... Product Code: NaN NaN A Tomy Disney Home Theatre (Disney Character) http://www.infantree.net/shop/index.php?route=product/product&path=71_91&product_id=1056
Process finished with exit code 0
Posts: 8,160
Threads: 160
Joined: Sep 2016
Jun-06-2018, 10:21 AM
(This post was last modified: Jun-06-2018, 10:21 AM by buran.)
(Jun-06-2018, 04:53 AM)PrateekG Wrote: 1. 'list index out of range' in get_product_details() function. I don't see this error in your output.... When you are not able to debug, don't mask the full traceback...
Posts: 88
Threads: 33
Joined: Apr 2018
First Issue-
check_merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3]
if check_merchant_product_id is not None:
merchant_product_id = check_merchant_product_id.text
if merchant_product_id is not None:
prod_details['merchant_product_id'] = merchant_product_id Error: getting 'Product Code:' instead of value
Second Issue-
prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
soup.find('div', attrs={
'class': 'left'}).findAll(
'a'))))) Error: getting broken image urls
Third Issue-
check_price = soup.find('span', attrs={"class": "price-old"})
if check_price is not None:
prod_details['price'] = check_price.text.replace("SGD $", "")
check_sale_price = soup.find('span', attrs={"class": "price-new"})
if check_sale_price is not None:
prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "") Error: getting NaN for some products
Posts: 8,160
Threads: 160
Joined: Sep 2016
I give up... Sorry, but you are hopeless case...
Posts: 88
Threads: 33
Joined: Apr 2018
Jun-06-2018, 10:31 AM
(This post was last modified: Jun-06-2018, 11:21 AM by PrateekG.)
(Jun-06-2018, 09:29 AM)buran Wrote: You are the one that needs help. Help us to help you. I don't want to run your code to find the error myself. If not for anything else, I may have different setup... You must provide the full traceback in error tags... Also URL is not defined in your code, so I cannot run it, even if I am willing to do so. Which I am not....
I have post my code, logs and issues.
Hoping now you can help me!
It seems you are not interested in helping rather creating issues.
(Jun-06-2018, 10:21 AM)buran Wrote: (Jun-06-2018, 04:53 AM)PrateekG Wrote: 1. 'list index out of range' in get_product_details() function. I don't see this error in your output.... When you are not able to debug, don't mask the full traceback...
Just try to see first line of the output instead of creating issue.
It clearly says- list index out of range
Posts: 7,320
Threads: 123
Joined: Sep 2016
The whole thing is messy as pointed out bye @ buran.
It's not code that we can run,missing imports and call to function.
attrs={"class": "buy"} Do you find a class='buy' on that site?
Just to show a quick test,with code that can be run(no missing parts).
Take out name of featured product, class_='price' will take out price.
import requests
from bs4 import BeautifulSoup
url = 'http://www.infantree.net/shop/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
#print(soup)
featured = soup.find('section', class_="featured span")
li = featured.find_all('li')
for item in li:
print(item.find('div', class_='name').text.strip()) Output: THE FIRST YEARS Bottle Warmer and Cooler
THE FIRST YEARS Steam Sterilizer
AMEDA Store'N Pour Breast Milk Storage Bags
AMEDA Lactaline Personal Breastpump
LASCAL Kiddy Guard Accent - Black
Do small test like this first to make sure that all work.
If get error in this short code,it's easy to see what's wrong.
Then can structure in functions(not to much code in one function).
|