Python Forum

Full Version: Merge htm files with shutil library (TypeError: 'module' object is not callable)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
why is my code not working?

import shutil

filenames = shutil('*.htm')  # list of all .htm files in the directory

with open('output_file.htm','wb') as wfd:
    for f in filenames:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)
I get this error:

Error:
Traceback (most recent call last): File "E:\Carte\BB\17 - Site Leadership\alte\Ionel Balauta\Aryeht\Task 1 - Traduce tot site-ul\Doar Google Web\Andreea\Meditatii\Sedinta 31 august 2022\merge txt - versiune 2 .py", line 3, in <module> filenames = shutil('*.htm') # list of all .htm files in the directory TypeError: 'module' object is not callable
Use
import glob
filenames = glob.glob('*.htm')
(Aug-28-2022, 07:04 AM)Gribouillis Wrote: [ -> ]Use
import glob
filenames = glob.glob('*.htm')

yes, with globe works. But I still don't understand why does shutil is not working. ?!
(Aug-28-2022, 07:07 AM)Melcu54 Wrote: [ -> ]But I still don't understand why does shutil is not working. ?!
Because shutil is a module. It is not a function that returns a list of files.
Line 3: why are you trying to call the shutil module as a function?
(Aug-28-2022, 07:10 AM)ndc85430 Wrote: [ -> ]Line 3: why are you trying to call the shutil module as a function?

ok, now it works. Thanks a lot !

import shutil
import os
import glob

import glob
filenames = glob.glob('*.htm')

with open('output_file.htm','wb') as wfd:
    for f in filenames:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)
If you run the program twice, then also the content of "ouput_file.htm" is included to "output_file.htm", but "output_file.htm" gets new content, because the output is read and written from the same file. I tried this on a NVMe and file size was growing very fast. I deleted afterward 4 GiB data because of this silly mistake.

Example with Path objects
from pathlib import Path
from shutil import copyfileobj


def merge(path: str|Path, glob: str, output: str|Path, show:bool=False) -> None:
    """
    Merge files found in path by glob pattern.
    All data is written to output.

    Args:
        path (str | Path): Path to find files
        glob (str): glob pattern to find files in path
        output (str | Path): Output file
        show (bool, optional): Print processed file. Defaults to False.
    """
    # Ensure that output is a Path object
    output = Path(output)
    
    # excluding output file from the list of files
    files = [file for file in Path(path).glob(glob) if file != output]
    
    # keep in mind, that the order of files is not given
    # sorting files by modification time
    # but I guess it's not what you want
    files.sort(key=lambda file: file.stat().st_mtime)

    if not files:
        return

    with open(output, "wb") as fd_out:
        for file in files:
            if show:
                print(f"Processing {file}")

            with file.open("rb") as fd_in:
                copyfileobj(fd_in, fd_out)


merge(".", "*.txt", "output_file.txt", show=True)
htm or html files are text files.

I don't have any .htm files, but I do have .html files!

Why on earth you might want to shove all the htm* files together in 1 text file, I have no idea! How does that help? Why not put them all in a zip file?

But you could do it like this, just using Path from pathlib:

from pathlib import Path

destination = Path('/home/pedro/temp/output.txt')
source = Path('/var/www/html/')

files_list = sorted(source.glob('*.html'))

with destination.open(mode='a') as d:    
    for file in files_list:        
        htm = source / file # looks like PosixPath('/var/www/html/index.html')
        with htm.open() as f:
            html = f.read()
            d.write(html)
I found d.write_text(html) did not work, not sure why, path lib docs seem to think it should work:

Quote:Traceback (most recent call last):
File "<pyshell#19>", line 6, in <module>
d.write_text(html)
AttributeError: '_io.TextIOWrapper' object has no attribute 'write_text'