Python Forum
How to clean html content using BeautifulSoup in Python 3.6?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to clean html content using BeautifulSoup in Python 3.6?
#1
I have created a pandas DataFrame which stores the html content of a product description. The html content is like below-

<p><img src="//ad.xyz.com/s/files/1/2352/2977/files/logo-3_large.png?v=1512189111" alt="10mois 5 in 1 Convertible Baby Bed &amp; Desk"><br><br></p>\n<h1><strong>10 mois 5 in 1 Convertible Baby Bed &amp; Desk<br><br></strong></h1>

Now I need to write a function which can parse the html tags using BeautifulSoup and can return a filtered version with whitelisted tags only.

Here whitelisted tags is basically a list of desired tags as below-
whitelist = ['p', 'h1','b','i','u','br','li']

Can anyone please help me to achieve this using Python 3.6?

Thanks!
Reply


Messages In This Thread
How to clean html content using BeautifulSoup in Python 3.6? - by PrateekG - Apr-26-2018, 05:37 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Strange ModuleNotFound Error on BeautifulSoup for Python 3.11 Gaberson19 1 919 Jul-13-2023, 10:38 AM
Last Post: Gaurav_Kumar
  Retrieve website content using Python? Vadanane 1 1,196 Jan-16-2023, 09:55 AM
Last Post: Axel_Erfurt
  Getting a URL from Amazon using requests-html, or beautifulsoup aaander 1 1,618 Nov-06-2022, 10:59 PM
Last Post: snippsat
  requests-html + Beautifulsoup klaarnou 0 2,399 Mar-21-2022, 05:31 PM
Last Post: klaarnou
  Python Obstacles | Krav Maga | Wiki Scraped Content [Column Copy] BrandonKastning 4 2,161 Jan-03-2022, 06:59 AM
Last Post: BrandonKastning
  Python Obstacles | Kapap | Wiki Scraped Content [Column Nulling] BrandonKastning 2 1,687 Jan-03-2022, 04:26 AM
Last Post: BrandonKastning
  Python BeautifulSoup gives unusable text? dggo666 0 1,405 Oct-29-2021, 05:12 AM
Last Post: dggo666
  Python Web Scraping can not getting all HTML content yqqwe123 0 1,616 Aug-02-2021, 08:56 AM
Last Post: yqqwe123
  Python BeautifulSoup IndexError: list index out of range rhat398 1 6,163 May-28-2021, 09:09 PM
Last Post: Daring_T
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,530 Mar-14-2021, 12:23 PM
Last Post: rfeyer

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020