Python Forum
How do I get rid of the HTML tags in my output?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I get rid of the HTML tags in my output?
#1
How do I remove HTML tags from the following code? This is what I've tried:

import collections
import itertools
import sys
import csv
import glob
import re

def striphtml(data):
    p = re.compile(r'<.*?>')
    return p.sub('', data)

for filepath in glob.glob('**/*.html', recursive=True):
	with open(filepath) as f:
	    before = collections.deque(maxlen=10)
	    for line in f:
	        if 'apple' in line:
	            sys.stdout.writelines(before)
	            sys.stdout.write(line)
	            sys.stdout.writelines(itertools.islice(f, 10))
	            break
	        results = before.append(line)
blah = striphtml(results)
print(blah)
The printed code still has HTML tags in it. I don't have to do it in regex; whatever is easiest should be fine.
Reply


Messages In This Thread
How do I get rid of the HTML tags in my output? - by glittergirl - Aug-05-2019, 07:21 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
Question Python Obstacles | Jeet-Kune-Do | BS4 (Tags > MariaDB) [URL/Local HTML] BrandonKastning 0 1,426 Feb-08-2022, 08:55 PM
Last Post: BrandonKastning
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,649 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Scrape for html based on url string and output into csv dana 13 5,476 Jan-13-2021, 03:52 PM
Last Post: snippsat
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 3,478 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  Easy HTML Parser: Validating trs by attributes several tags deep? runswithascript 7 3,607 Aug-14-2020, 10:58 PM
Last Post: runswithascript
  Jinja2 HTML <a> tags not rendering properly ChaitanyaPy 4 3,260 Jun-28-2020, 06:12 PM
Last Post: ChaitanyaPy
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,371 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Beutifulsoup: how to pick text that's not in HTML tags? pitonas 4 4,731 Oct-08-2018, 01:43 PM
Last Post: pitonas
  How to read html tags dynamically generated? amandacstr 5 7,607 Mar-05-2018, 06:07 AM
Last Post: snippsat
  bs4 : output html content into a txt file smallabc 2 23,300 Jan-02-2018, 04:18 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020