How to clean html content using BeautifulSoup in Python 3.6?

PrateekG · (This post was last modified: Apr-26-2018, 08:13 AM by PrateekG.)

Hi Buran,

I am getting a response from an e-commerce site in the form of json and this response contains many attributes in which html content is in 'body_html' attribute.
So after getting the response I amd storing only the html content in a dataframe.
If there is an alternate approach please suggest me that also.

I am trying with below code-

product_description = data["body_html"]
def filter_product_description(product_description):
	whitelist = ['p', 'h1','b','strong','span']
    html_series = product_description.all()
    # print(html_series)
    keep = []
    for html_description in html_series:

        soup = BeautifulSoup(html_description, "html.parser")

        for tag in soup.findAll(True):
            if tag in whitelist:
                keep.append(tag)

    return keep
res= filter_product_description(product_description)
print(res)

I want to use this function as cleaning up of html content which returns inly the text which have the tags listed in whitelist.

Thanks!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Extracting content from a website using Python?	SandraYokum	2	410	May-27-2024, 03:30 AM Last Post: Davidleo
	Strange ModuleNotFound Error on BeautifulSoup for Python 3.11	Gaberson19	1	1,165	Jul-13-2023, 10:38 AM Last Post: Gaurav_Kumar
	Retrieve website content using Python?	Vadanane	1	1,385	Jan-16-2023, 09:55 AM Last Post: Axel_Erfurt
	Getting a URL from Amazon using requests-html, or beautifulsoup	aaander	1	1,787	Nov-06-2022, 10:59 PM Last Post: snippsat
	requests-html + Beautifulsoup	klaarnou	0	2,504	Mar-21-2022, 05:31 PM Last Post: klaarnou
	Python Obstacles \| Krav Maga \| Wiki Scraped Content [Column Copy]	BrandonKastning	4	2,343	Jan-03-2022, 06:59 AM Last Post: BrandonKastning
	Python Obstacles \| Kapap \| Wiki Scraped Content [Column Nulling]	BrandonKastning	2	1,819	Jan-03-2022, 04:26 AM Last Post: BrandonKastning
	Python BeautifulSoup gives unusable text?	dggo666	0	1,476	Oct-29-2021, 05:12 AM Last Post: dggo666
	Python Web Scraping can not getting all HTML content	yqqwe123	0	1,706	Aug-02-2021, 08:56 AM Last Post: yqqwe123
	Python BeautifulSoup IndexError: list index out of range	rhat398	1	6,360	May-28-2021, 09:09 PM Last Post: Daring_T

How to clean html content using BeautifulSoup in Python 3.6?

User Panel Messages

Announcements