Aug-05-2019, 07:21 PM
How do I remove HTML tags from the following code? This is what I've tried:
import collections import itertools import sys import csv import glob import re def striphtml(data): p = re.compile(r'<.*?>') return p.sub('', data) for filepath in glob.glob('**/*.html', recursive=True): with open(filepath) as f: before = collections.deque(maxlen=10) for line in f: if 'apple' in line: sys.stdout.writelines(before) sys.stdout.write(line) sys.stdout.writelines(itertools.islice(f, 10)) break results = before.append(line) blah = striphtml(results) print(blah)The printed code still has HTML tags in it. I don't have to do it in regex; whatever is easiest should be fine.