Hi Buran,
I am getting a response from an e-commerce site in the form of json and this response contains many attributes in which html content is in 'body_html' attribute.
So after getting the response I amd storing only the html content in a dataframe.
If there is an alternate approach please suggest me that also.
I am trying with below code-
Thanks!
I am getting a response from an e-commerce site in the form of json and this response contains many attributes in which html content is in 'body_html' attribute.
So after getting the response I amd storing only the html content in a dataframe.
If there is an alternate approach please suggest me that also.
I am trying with below code-
product_description = data["body_html"] def filter_product_description(product_description): whitelist = ['p', 'h1','b','strong','span'] html_series = product_description.all() # print(html_series) keep = [] for html_description in html_series: soup = BeautifulSoup(html_description, "html.parser") for tag in soup.findAll(True): if tag in whitelist: keep.append(tag) return keep res= filter_product_description(product_description) print(res)I want to use this function as cleaning up of html content which returns inly the text which have the tags listed in whitelist.
Thanks!