Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How do I get rid of the HTML tags in my output?
#1
How do I remove HTML tags from the following code? This is what I've tried:

import collections
import itertools
import sys
import csv
import glob
import re

def striphtml(data):
    p = re.compile(r'<.*?>')
    return p.sub('', data)

for filepath in glob.glob('**/*.html', recursive=True):
	with open(filepath) as f:
	    before = collections.deque(maxlen=10)
	    for line in f:
	        if 'apple' in line:
	            sys.stdout.writelines(before)
	            sys.stdout.write(line)
	            sys.stdout.writelines(itertools.islice(f, 10))
	            break
	        results = before.append(line)
blah = striphtml(results)
print(blah)
The printed code still has HTML tags in it. I don't have to do it in regex; whatever is easiest should be fine.
Quote
#2
Use html2text,look at this post.
Also beware that it will not always look good,people who make html has never just all text output in mind.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Beutifulsoup: how to pick text that's not in HTML tags? pitonas 4 767 Oct-08-2018, 01:43 PM
Last Post: pitonas
  How to read html tags dynamically generated? amandacstr 5 2,010 Mar-05-2018, 06:07 AM
Last Post: snippsat
  bs4 : output html content into a txt file smallabc 2 10,464 Jan-02-2018, 04:18 PM
Last Post: snippsat
  read text file using python and display its output to html using django amit 0 11,611 Jul-23-2017, 06:14 AM
Last Post: amit

Forum Jump:


Users browsing this thread: 1 Guest(s)