Python Forum
How to clean html content using BeautifulSoup in Python 3.6?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to clean html content using BeautifulSoup in Python 3.6?
#3
Hi Buran,

I am getting a response from an e-commerce site in the form of json and this response contains many attributes in which html content is in 'body_html' attribute.
So after getting the response I amd storing only the html content in a dataframe.
If there is an alternate approach please suggest me that also.

I am trying with below code-
product_description = data["body_html"]
def filter_product_description(product_description):
	whitelist = ['p', 'h1','b','strong','span']
    html_series = product_description.all()
    # print(html_series)
    keep = []
    for html_description in html_series:

        soup = BeautifulSoup(html_description, "html.parser")

        for tag in soup.findAll(True):
            if tag in whitelist:
                keep.append(tag)

    return keep
res= filter_product_description(product_description)
print(res)
I want to use this function as cleaning up of html content which returns inly the text which have the tags listed in whitelist.

Thanks!
Reply


Messages In This Thread
RE: How to clean html content using BeautifulSoup in Python 3.6? - by PrateekG - Apr-26-2018, 08:13 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Extracting content from a website using Python? SandraYokum 2 410 May-27-2024, 03:30 AM
Last Post: Davidleo
  Strange ModuleNotFound Error on BeautifulSoup for Python 3.11 Gaberson19 1 1,165 Jul-13-2023, 10:38 AM
Last Post: Gaurav_Kumar
  Retrieve website content using Python? Vadanane 1 1,385 Jan-16-2023, 09:55 AM
Last Post: Axel_Erfurt
  Getting a URL from Amazon using requests-html, or beautifulsoup aaander 1 1,787 Nov-06-2022, 10:59 PM
Last Post: snippsat
  requests-html + Beautifulsoup klaarnou 0 2,504 Mar-21-2022, 05:31 PM
Last Post: klaarnou
  Python Obstacles | Krav Maga | Wiki Scraped Content [Column Copy] BrandonKastning 4 2,343 Jan-03-2022, 06:59 AM
Last Post: BrandonKastning
  Python Obstacles | Kapap | Wiki Scraped Content [Column Nulling] BrandonKastning 2 1,819 Jan-03-2022, 04:26 AM
Last Post: BrandonKastning
  Python BeautifulSoup gives unusable text? dggo666 0 1,476 Oct-29-2021, 05:12 AM
Last Post: dggo666
  Python Web Scraping can not getting all HTML content yqqwe123 0 1,706 Aug-02-2021, 08:56 AM
Last Post: yqqwe123
  Python BeautifulSoup IndexError: list index out of range rhat398 1 6,360 May-28-2021, 09:09 PM
Last Post: Daring_T

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020