Python Forum
How do I extract specific lines from HTML files before and after a word?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I extract specific lines from HTML files before and after a word?
#1
I am trying to extract the 10 lines before and after the word "apple" from a directory (with subdirectories) full of HTML files. I want to print out the lines into a CSV file. Ideally, the CSV file will contain two variables: 1) the HTML filename and 2) the 10 lines before and after the word "apple".

I have done the following:

import glob
import collections
import itertools
import sys
import csv

for filepath in glob.glob('**/*.html', recursive=True):
    with open(filepath) as f:
        before = collections.deque(maxlen=10)
        for line in f:
            if 'apple' in line:
                sys.stdout.writelines(before)
                sys.stdout.write(line)
                sys.stdout.writelines(itertools.islice(f, 10))
            break
        results = before.append(line)
        print(results)
I am currently getting a bunch of rows that say "None" in my terminal when I print the results. What is the issue here?
Reply
#2
Why do you expect that "append" method returns a value?
https://docs.python.org/2/library/collec...que.append
Nothing about the value returned. In case if a function doesn't return a result python always returns None.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Populating list items to html code and create individualized html code files ChainyDaisy 0 1,560 Sep-21-2022, 07:18 PM
Last Post: ChainyDaisy
  Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB BrandonKastning 8 3,089 Nov-22-2021, 01:38 AM
Last Post: BrandonKastning
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,528 Mar-14-2021, 12:23 PM
Last Post: rfeyer
Smile Extracting the Address tag from multiple HTML files using BeautifulSoup Dredd 8 4,794 Jan-25-2021, 12:16 PM
Last Post: Dredd
  How to fix looking specific word in a webpage BSOD 0 1,830 Jun-16-2020, 08:01 PM
Last Post: BSOD
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,328 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  How to get the href value of a specific word in the html code julio2000 2 3,143 Mar-05-2020, 07:50 PM
Last Post: julio2000
  Web crawler extracting specific text from HTML lewdow 1 3,342 Jan-03-2020, 11:21 PM
Last Post: snippsat
  Extract text between bold headlines from HTML CostasG 1 2,271 Aug-31-2019, 10:53 AM
Last Post: snippsat
  Getting a specific text inside an html with soup mathieugrimbert 9 15,807 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020