Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How do I get rid of the HTML tags in my output?
How do I remove HTML tags from the following code? This is what I've tried:

import collections
import itertools
import sys
import csv
import glob
import re

def striphtml(data):
    p = re.compile(r'<.*?>')
    return p.sub('', data)

for filepath in glob.glob('**/*.html', recursive=True):
	with open(filepath) as f:
	    before = collections.deque(maxlen=10)
	    for line in f:
	        if 'apple' in line:
	            sys.stdout.writelines(itertools.islice(f, 10))
	        results = before.append(line)
blah = striphtml(results)
The printed code still has HTML tags in it. I don't have to do it in regex; whatever is easiest should be fine.
Use html2text,look at this post.
Also beware that it will not always look good,people who make html has never just all text output in mind.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  HTML Styling Not Working yoitspython 1 109 Aug-13-2019, 06:26 AM
Last Post: fishhook
  spliting html code with br tag yokaso 11 262 Aug-07-2019, 03:18 PM
Last Post: snippsat
  convert html table to json bhojendra 5 158 Jul-30-2019, 07:53 PM
Last Post: DeaD_EyE
  How to capture Single Column from Web Html Table? ahmedwaqas92 5 287 Jul-29-2019, 02:17 AM
Last Post: ahmedwaqas92
  Getting a specific text inside an html with soup mathieugrimbert 9 366 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  Beautiful soup and tags starter_student 11 524 Jul-08-2019, 03:41 PM
Last Post: starter_student
  getting options from a html form pgoosen 5 344 Jul-03-2019, 06:07 PM
Last Post: nilamo
  [Flask] html error 405 SheeppOSU 0 182 Jun-08-2019, 04:42 PM
Last Post: SheeppOSU
  [split] Using beautiful soup to get html attribute value moski 6 375 Jun-03-2019, 04:24 PM
Last Post: moski
  html error 404 SheeppOSU 1 212 Jun-03-2019, 02:19 PM
Last Post: heiner55

Forum Jump:

Users browsing this thread: 1 Guest(s)