Merge htm files with shutil library (TypeError: 'module' object is not callable)

Melcu54 · (This post was last modified: Aug-28-2022, 09:10 AM by Yoriz.)

why is my code not working?

import shutil

filenames = shutil('*.htm')  # list of all .htm files in the directory

with open('output_file.htm','wb') as wfd:
    for f in filenames:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)

I get this error:

Error:Traceback (most recent call last):
  File "E:\Carte\BB\17 - Site Leadership\alte\Ionel Balauta\Aryeht\Task 1 - Traduce tot site-ul\Doar Google Web\Andreea\Meditatii\Sedinta 31 august 2022\merge txt - versiune 2 .py", line 3, in <module>
    filenames = shutil('*.htm')  # list of all .htm files in the directory
TypeError: 'module' object is not callable

**Gribouillis** · Aug-28-2022, 07:04 AM

Use

import glob
filenames = glob.glob('*.htm')

Melcu54 · (This post was last modified: Aug-28-2022, 07:08 AM by Melcu54.)

(Aug-28-2022, 07:04 AM)Gribouillis Wrote: Use
import glob
filenames = glob.glob('*.htm')

yes, with globe works. But I still don't understand why does shutil is not working. ?!

**Gribouillis** · (This post was last modified: Aug-28-2022, 07:09 AM by Gribouillis.)

(Aug-28-2022, 07:07 AM)Melcu54 Wrote: But I still don't understand why does shutil is not working. ?!

Because shutil is a module. It is not a function that returns a list of files.

ndc85430 · Aug-28-2022, 07:10 AM

Line 3: why are you trying to call the shutil module as a function?

Melcu54 · Aug-28-2022, 07:11 AM

(Aug-28-2022, 07:10 AM)ndc85430 Wrote: Line 3: why are you trying to call the shutil module as a function?

ok, now it works. Thanks a lot !

import shutil
import os
import glob

import glob
filenames = glob.glob('*.htm')

with open('output_file.htm','wb') as wfd:
    for f in filenames:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)

DeaD_EyE · Mar-08-2025, 01:07 PM

If you run the program twice, then also the content of "ouput_file.htm" is included to "output_file.htm", but "output_file.htm" gets new content, because the output is read and written from the same file. I tried this on a NVMe and file size was growing very fast. I deleted afterward 4 GiB data because of this silly mistake.

Example with Path objects

from pathlib import Path
from shutil import copyfileobj


def merge(path: str|Path, glob: str, output: str|Path, show:bool=False) -> None:
    """
    Merge files found in path by glob pattern.
    All data is written to output.

    Args:
        path (str | Path): Path to find files
        glob (str): glob pattern to find files in path
        output (str | Path): Output file
        show (bool, optional): Print processed file. Defaults to False.
    """
    # Ensure that output is a Path object
    output = Path(output)
    
    # excluding output file from the list of files
    files = [file for file in Path(path).glob(glob) if file != output]
    
    # keep in mind, that the order of files is not given
    # sorting files by modification time
    # but I guess it's not what you want
    files.sort(key=lambda file: file.stat().st_mtime)

    if not files:
        return

    with open(output, "wb") as fd_out:
        for file in files:
            if show:
                print(f"Processing {file}")

            with file.open("rb") as fd_in:
                copyfileobj(fd_in, fd_out)


merge(".", "*.txt", "output_file.txt", show=True)

Pedroski55 · Mar-09-2025, 04:25 PM

htm or html files are text files.

I don't have any .htm files, but I do have .html files!

Why on earth you might want to shove all the htm* files together in 1 text file, I have no idea! How does that help? Why not put them all in a zip file?

But you could do it like this, just using Path from pathlib:

from pathlib import Path

destination = Path('/home/pedro/temp/output.txt')
source = Path('/var/www/html/')

files_list = sorted(source.glob('*.html'))

with destination.open(mode='a') as d:    
    for file in files_list:        
        htm = source / file # looks like PosixPath('/var/www/html/index.html')
        with htm.open() as f:
            html = f.read()
            d.write(html)

I found d.write_text(html) did not work, not sure why, path lib docs seem to think it should work:

Quote:Traceback (most recent call last):
File "<pyshell#19>", line 6, in <module>
d.write_text(html)
AttributeError: '_io.TextIOWrapper' object has no attribute 'write_text'

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	I'm trying to merge 2 .csv files with no joy!	Sick_Stigma	3	999	Aug-03-2024, 03:20 PM Last Post: mariadsouza362
	I am getting this TypeError: 'TreasureMap' object is not subscriptable.	makilakos	2	1,284	May-25-2024, 07:58 PM Last Post: deanhystad
	Using zipfile module - finding folders not files	darter1010	2	2,262	Apr-06-2024, 07:22 AM Last Post: Pedroski55
	TypeError: cannot pickle ‘_asyncio.Future’ object	Abdul_Rafey	1	2,815	Mar-07-2024, 03:40 PM Last Post: deanhystad
	error in class: TypeError: 'str' object is not callable	akbarza	2	1,815	Dec-30-2023, 04:35 PM Last Post: deanhystad
	use of shutil.copytree with ENOTDIR exception	yan	2	2,846	Nov-29-2023, 03:02 PM Last Post: yan
	TypeError: 'NoneType' object is not subscriptable	TheLummen	4	3,756	Nov-27-2023, 11:34 AM Last Post: TheLummen
	merge all xlsb files into csv	mg24	0	833	Nov-13-2023, 08:25 AM Last Post: mg24
	TypeError: 'NoneType' object is not callable	akbarza	4	14,425	Aug-24-2023, 05:14 PM Last Post: snippsat
	[NEW CODER] TypeError: Object is not callable	iwantyoursec	5	5,775	Aug-23-2023, 06:21 PM Last Post: deanhystad

Merge htm files with shutil library (TypeError: 'module' object is not callable)

User Panel Messages

Announcements