Python Forum
How do I get rid of the HTML tags in my output?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I get rid of the HTML tags in my output?
#1
How do I remove HTML tags from the following code? This is what I've tried:

import collections
import itertools
import sys
import csv
import glob
import re

def striphtml(data):
    p = re.compile(r'<.*?>')
    return p.sub('', data)

for filepath in glob.glob('**/*.html', recursive=True):
	with open(filepath) as f:
	    before = collections.deque(maxlen=10)
	    for line in f:
	        if 'apple' in line:
	            sys.stdout.writelines(before)
	            sys.stdout.write(line)
	            sys.stdout.writelines(itertools.islice(f, 10))
	            break
	        results = before.append(line)
blah = striphtml(results)
print(blah)
The printed code still has HTML tags in it. I don't have to do it in regex; whatever is easiest should be fine.
Reply
#2
Use html2text,look at this post.
Also beware that it will not always look good,people who make html has never just all text output in mind.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question Python Obstacles | Jeet-Kune-Do | BS4 (Tags > MariaDB) [URL/Local HTML] BrandonKastning 0 1,400 Feb-08-2022, 08:55 PM
Last Post: BrandonKastning
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,536 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Scrape for html based on url string and output into csv dana 13 5,359 Jan-13-2021, 03:52 PM
Last Post: snippsat
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 3,414 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  Easy HTML Parser: Validating trs by attributes several tags deep? runswithascript 7 3,501 Aug-14-2020, 10:58 PM
Last Post: runswithascript
  Jinja2 HTML <a> tags not rendering properly ChaitanyaPy 4 3,185 Jun-28-2020, 06:12 PM
Last Post: ChaitanyaPy
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,329 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Beutifulsoup: how to pick text that's not in HTML tags? pitonas 4 4,646 Oct-08-2018, 01:43 PM
Last Post: pitonas
  How to read html tags dynamically generated? amandacstr 5 7,555 Mar-05-2018, 06:07 AM
Last Post: snippsat
  bs4 : output html content into a txt file smallabc 2 23,150 Jan-02-2018, 04:18 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020