Python Forum
How to remove duplicate elements in HTML?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to remove duplicate elements in HTML?
#1
Hello to all,

I´m trying to clean an html file that has repeated paragraphs within body. Below I show the input file and expected output.

Input.html https://jsfiddle.net/97ptc0Lh/4/

Output.html https://jsfiddle.net/97ptc0Lh/1/

I've been trying with the following code using BeautifulSoup but I don´t know why is not working, since the resultant list CleanHtml contains the repeated elements (paragraphs) that I´d like to remove. I already asked here, but still no much progress.

from bs4 import BeautifulSoup

fp = open("Input.html", "rb")
soup = BeautifulSoup(fp, "html5lib")

Uniques = set()
CleanHtml = []

for element in soup.html:
    if element not in Uniques:
        Uniques.add(element)
        CleanHtml.append(element)   

print (CleanHtml)
Thanks in advance for any help.
Reply
#2
Hi, is there something wrong with my question? It was my first question and nobody answered. Thanks
Reply
#3
Quote:Hi, is there something wrong with my question? It was my first question and nobody answered. Thanks
I don't believe anyone is ignoring you, it's just that, speaking for myself, I think of html as a delivery medium that is generated on the fly, and as such cannot be thought of as static, so why bother.

That being said, you may have a legitimate reason for doing so, but again an unusual request.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  unable to remove all elements from list based on a condition sg_python 3 373 Jan-27-2024, 04:03 PM
Last Post: deanhystad
  How to remove some elements from an array in python? gohanhango 9 983 Nov-28-2023, 08:35 AM
Last Post: Gribouillis
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 877 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  ValueError: Length mismatch: Expected axis has 8 elements, new values have 1 elements ilknurg 1 5,013 May-17-2022, 11:38 AM
Last Post: Larz60+
  reading html and edit chekcbox to html jacklee26 5 3,020 Jul-01-2021, 10:31 AM
Last Post: snippsat
  Sorting Elements via parameters pointing to those elements. rpalmer 3 2,549 Feb-10-2021, 04:53 PM
Last Post: rpalmer
  Remove specific elements from list with a pattern Xalagy 3 2,623 Oct-11-2020, 07:18 AM
Last Post: Xalagy
  remove elements method not working spalisetty06 4 2,409 Aug-13-2020, 01:17 PM
Last Post: deanhystad
  Remove elements from lists leemao 4 2,309 Jun-21-2020, 11:08 AM
Last Post: leemao
  HTML to Python to Windows .bat and back to HTML perfectservice33 0 1,918 Aug-22-2019, 06:31 AM
Last Post: perfectservice33

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020