Python Forum
How do I extract specific lines from HTML files before and after a word? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: How do I extract specific lines from HTML files before and after a word? (/thread-20334.html)



How do I extract specific lines from HTML files before and after a word? - glittergirl - Aug-05-2019

I am trying to extract the 10 lines before and after the word "apple" from a directory (with subdirectories) full of HTML files. I want to print out the lines into a CSV file. Ideally, the CSV file will contain two variables: 1) the HTML filename and 2) the 10 lines before and after the word "apple".

I have done the following:

import glob
import collections
import itertools
import sys
import csv

for filepath in glob.glob('**/*.html', recursive=True):
    with open(filepath) as f:
        before = collections.deque(maxlen=10)
        for line in f:
            if 'apple' in line:
                sys.stdout.writelines(before)
                sys.stdout.write(line)
                sys.stdout.writelines(itertools.islice(f, 10))
            break
        results = before.append(line)
        print(results)
I am currently getting a bunch of rows that say "None" in my terminal when I print the results. What is the issue here?


RE: How do I extract specific lines from HTML files before and after a word? - fishhook - Aug-06-2019

Why do you expect that "append" method returns a value?
https://docs.python.org/2/library/collections.html#collections.deque.append
Nothing about the value returned. In case if a function doesn't return a result python always returns None.