How do I get rid of the HTML tags in my output? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: How do I get rid of the HTML tags in my output? (/thread-20336.html) |
How do I get rid of the HTML tags in my output? - glittergirl - Aug-05-2019 How do I remove HTML tags from the following code? This is what I've tried: import collections import itertools import sys import csv import glob import re def striphtml(data): p = re.compile(r'<.*?>') return p.sub('', data) for filepath in glob.glob('**/*.html', recursive=True): with open(filepath) as f: before = collections.deque(maxlen=10) for line in f: if 'apple' in line: sys.stdout.writelines(before) sys.stdout.write(line) sys.stdout.writelines(itertools.islice(f, 10)) break results = before.append(line) blah = striphtml(results) print(blah)The printed code still has HTML tags in it. I don't have to do it in regex; whatever is easiest should be fine. RE: How do I get rid of the HTML tags in my output? - snippsat - Aug-05-2019 Use html2text,look at this post. Also beware that it will not always look good,people who make html has never just all text output in mind. |