Python Forum
how to make my product description fetching function generic?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to make my product description fetching function generic?
#6
The complete code:

import requests
from bs4 import BeautifulSoup


def get_soup(url):

    try:
        response = requests.get(url)
        if response.status_code == 200:
            html = response.content
            return BeautifulSoup(html, "html.parser")
    except Exception as ex:
        print("error from " + url + ": " + str(ex))


def get_product_details(url):

    try:
        soup = get_soup(url)

        desc_list = []
        # Get the product Name before the closing tags inserted by bs4
        desc_list.append(soup.select('#detail-right'))
        start_tag = '<input id="itemPrice1" name="nuPrice1" type="hidden" value=""/>'
        end_tag = '<div id="qty">'

        # Loop through tags and append those between start_tag and end_tag
        flag_append = False
        for content in soup.findAll():
            if(start_tag in str(content)):
                flag_append = True
            if(end_tag in str(content)):
                break
            if(flag_append):
                desc_list.append(content.contents)

        prod_details = {}
        prod_details['description'] = ''.join([str(i) for i in desc_list])
        for item in desc_list:
            if item:
                print(item)
        return prod_details
    except Exception as ex:
        logger.warning('%s - %s', ex, url)


if __name__ == '__main__':
    print("product1 description:")
    get_product_details("http://www.aprisin.com.sg/p-748-littletikespoptunesguitar.html")
    print("\n\nproduct2 description:")
    get_product_details("http://www.aprisin.com.sg/p-1052-172083littletikesclassiccastle.html")
You'll have to parse every single line (item) to extract only the text from the HTML tags.
BTW remember that this is a poor solution. The right one should be getting BeautifulSoup parse the HTML source without extras closing tags.

Here, if I print the content of soup

soup = get_soup(url)
print(soup)
I'm getting this (just one part of the soup content):
Output:
<div id="detail-right"> <h1 id="detail-name">LIttle Tikes PopTunes™ GUITAR </h1> <span style="color: #000; font-size: 12px; font-weight: normal;">Product Code : LT636226</span><br/> <!-- price update attributes begin --> <span class="price"><span class="linethrough">S$49.00</span> S$39.00</span></div></div></div></form></div></div></div></div></div></body></html>
The bs4 is closing the <body> and <html> in the middle of the original HTML source code.
Reply


Messages In This Thread
RE: how to make my product description fetching function generic? - by gontajones - Jun-29-2018, 09:52 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Fetching Images from DB in Django Dexty 2 1,815 Mar-15-2024, 08:43 AM
Last Post: firn100
  All product links to products on a website MarionStorm 0 1,126 Jun-02-2022, 11:17 PM
Last Post: MarionStorm
  fetching, parsing data from Wikipedia apollo 2 3,620 May-06-2021, 08:08 PM
Last Post: snippsat
  Fetching and Parsing XML Data FalseFact 3 3,341 Apr-01-2019, 10:21 AM
Last Post: Larz60+
  My Django 2.0.6 logging is not working while product merging PrateekG 0 2,219 Jul-26-2018, 02:24 PM
Last Post: PrateekG
  Need help to get product details using BeautifulSoup+Python3.6! PrateekG 2 2,940 Jun-27-2018, 08:52 AM
Last Post: PrateekG
  Getting 'list index out of range' while fetching product details using BeautifulSoup? PrateekG 8 8,321 Jun-06-2018, 12:15 PM
Last Post: snippsat
  Unable to fetch product url using BeautifulSoup with Python3.6 PrateekG 6 4,352 Jun-05-2018, 05:49 PM
Last Post: PrateekG
  Generic If Popup Exists Close It Script digitalmatic7 1 2,532 Feb-18-2018, 07:24 AM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020