Python Forum
How do I read the HTML files in a directory and write the content into a CSV file?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I read the HTML files in a directory and write the content into a CSV file?
#1
I am trying to read all the HTML files in a directory and write them into a CSV file. Each row in the CSV file will contain the contents of one HTML file.

I seem to be able to only read one HTML file and write that one file into one row of a CSV file.

import fnmatch
from pathlib import Path

directory = "directory/"

for dirpath, dirs, files in os.walk(directory):
    for filename in fnmatch.filter(files, '*.html'):
        with open(os.path.join(dirpath, filename)) as f:
            html = f.read()
            if 'apples and oranges' in html:
                with open('output.csv', 'w') as f:
                    writer = csv.writer(f)
                    lines = [[html]]
                    for l in lines:
                        writer.writerow(l)
I currently only see one HTML file being printed out into one CSV row.
Reply
#2
What you need to do is scrape the contents of the HTML.
There are several tools to do this, and each works for certain types of HTML content.
There is a quick tutorial on this forum, designed by Snippsat here (applies to html files, or web):
Web scraping part 1
Web scraping part 2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  write code that resides in parent directory franklin97355 3 372 Apr-14-2024, 02:03 AM
Last Post: franklin97355
  Last record in file doesn't write to newline gonksoup 3 404 Jan-22-2024, 12:56 PM
Last Post: deanhystad
  uploading files from a ubuntu local directory to Minio storage container dchilambo 0 447 Dec-22-2023, 07:17 AM
Last Post: dchilambo
  Recommended way to read/create PDF file? Winfried 3 2,869 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  write to csv file problem jacksfrustration 11 1,502 Nov-09-2023, 01:56 PM
Last Post: deanhystad
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,429 Nov-09-2023, 10:56 AM
Last Post: mg24
Question Special Characters read-write Prisonfeed 1 609 Sep-17-2023, 08:26 PM
Last Post: Gribouillis
  Need to replace a string with a file (HTML file) tester_V 1 761 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  change directory of save of python files akbarza 3 875 Jul-23-2023, 08:30 AM
Last Post: Gribouillis
  read file txt on my pc to telegram bot api Tupa 0 1,106 Jul-06-2023, 01:52 AM
Last Post: Tupa

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020