Python Forum
How do I read the HTML files in a directory and write the content into a CSV file?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I read the HTML files in a directory and write the content into a CSV file?
#1
I am trying to read all the HTML files in a directory and write them into a CSV file. Each row in the CSV file will contain the contents of one HTML file.

I seem to be able to only read one HTML file and write that one file into one row of a CSV file.

import fnmatch
from pathlib import Path

directory = "directory/"

for dirpath, dirs, files in os.walk(directory):
    for filename in fnmatch.filter(files, '*.html'):
        with open(os.path.join(dirpath, filename)) as f:
            html = f.read()
            if 'apples and oranges' in html:
                with open('output.csv', 'w') as f:
                    writer = csv.writer(f)
                    lines = [[html]]
                    for l in lines:
                        writer.writerow(l)
I currently only see one HTML file being printed out into one CSV row.
Reply
#2
What you need to do is scrape the contents of the HTML.
There are several tools to do this, and each works for certain types of HTML content.
There is a quick tutorial on this forum, designed by Snippsat here (applies to html files, or web):
Web scraping part 1
Web scraping part 2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How can I write formatted (i.e. bold, italic, change font size, etc.) text to a file? JohnJSal 12 27,799 Feb-13-2025, 04:48 AM
Last Post: tomhansky
  How to write variable in a python file then import it in another python file? tatahuft 4 860 Jan-01-2025, 12:18 AM
Last Post: Skaperen
  How to read a file as binary or hex "string" so that I can do regex search? tatahuft 3 985 Dec-19-2024, 11:57 AM
Last Post: snippsat
  [SOLVED] [Linux] Write file and change owner? Winfried 6 1,471 Oct-17-2024, 01:15 AM
Last Post: Winfried
  python read PDF Statement and write it into excel mg24 1 933 Sep-22-2024, 11:42 AM
Last Post: Pedroski55
  Read TXT file in Pandas and save to Parquet zinho 2 1,199 Sep-15-2024, 06:14 PM
Last Post: zinho
  deleting files in program files directory RRADC 6 2,829 Aug-21-2024, 06:11 PM
Last Post: snippsat
  FileNotFoundError: [Errno 2] No such file or directory although the file exists Arnibandyo 0 803 Aug-12-2024, 09:11 AM
Last Post: Arnibandyo
  Pycharm can't read file Genericgamemaker 5 1,526 Jul-24-2024, 08:10 PM
Last Post: deanhystad
  Python is unable to read file Genericgamemaker 13 3,497 Jul-19-2024, 06:42 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020