why is my code not working?
import shutil
filenames = shutil('*.htm') # list of all .htm files in the directory
with open('output_file.htm','wb') as wfd:
for f in filenames:
with open(f,'rb') as fd:
shutil.copyfileobj(fd, wfd)
I get this error:
Error:
Traceback (most recent call last):
File "E:\Carte\BB\17 - Site Leadership\alte\Ionel Balauta\Aryeht\Task 1 - Traduce tot site-ul\Doar Google Web\Andreea\Meditatii\Sedinta 31 august 2022\merge txt - versiune 2 .py", line 3, in <module>
filenames = shutil('*.htm') # list of all .htm files in the directory
TypeError: 'module' object is not callable
Use
import glob
filenames = glob.glob('*.htm')
(Aug-28-2022, 07:04 AM)Gribouillis Wrote: [ -> ]Use
import glob
filenames = glob.glob('*.htm')
yes, with globe works. But I still don't understand why does
shutil is not working. ?!
(Aug-28-2022, 07:07 AM)Melcu54 Wrote: [ -> ]But I still don't understand why does shutil is not working. ?!
Because shutil is a module. It is not a function that returns a list of files.
Line 3: why are you trying to call the shutil
module as a function?
(Aug-28-2022, 07:10 AM)ndc85430 Wrote: [ -> ]Line 3: why are you trying to call the shutil
module as a function?
ok, now it works. Thanks a lot !
import shutil
import os
import glob
import glob
filenames = glob.glob('*.htm')
with open('output_file.htm','wb') as wfd:
for f in filenames:
with open(f,'rb') as fd:
shutil.copyfileobj(fd, wfd)
If you run the program twice, then also the content of "ouput_file.htm" is included to "output_file.htm", but "output_file.htm" gets new content, because the output is read and written from the same file. I tried this on a NVMe and file size was growing very fast. I deleted afterward 4 GiB data because of this silly mistake.
Example with Path objects
from pathlib import Path
from shutil import copyfileobj
def merge(path: str|Path, glob: str, output: str|Path, show:bool=False) -> None:
"""
Merge files found in path by glob pattern.
All data is written to output.
Args:
path (str | Path): Path to find files
glob (str): glob pattern to find files in path
output (str | Path): Output file
show (bool, optional): Print processed file. Defaults to False.
"""
# Ensure that output is a Path object
output = Path(output)
# excluding output file from the list of files
files = [file for file in Path(path).glob(glob) if file != output]
# keep in mind, that the order of files is not given
# sorting files by modification time
# but I guess it's not what you want
files.sort(key=lambda file: file.stat().st_mtime)
if not files:
return
with open(output, "wb") as fd_out:
for file in files:
if show:
print(f"Processing {file}")
with file.open("rb") as fd_in:
copyfileobj(fd_in, fd_out)
merge(".", "*.txt", "output_file.txt", show=True)
htm or html files are text files.
I don't have any .htm files, but I do have .html files!
Why on earth you might want to shove all the htm* files together in 1 text file, I have no idea! How does that help? Why not put them all in a zip file?
But you could do it like this, just using Path from pathlib:
from pathlib import Path
destination = Path('/home/pedro/temp/output.txt')
source = Path('/var/www/html/')
files_list = sorted(source.glob('*.html'))
with destination.open(mode='a') as d:
for file in files_list:
htm = source / file # looks like PosixPath('/var/www/html/index.html')
with htm.open() as f:
html = f.read()
d.write(html)
I found d.write_text(html) did not work, not sure why, path lib docs seem to think it should work:
Quote:Traceback (most recent call last):
File "<pyshell#19>", line 6, in <module>
d.write_text(html)
AttributeError: '_io.TextIOWrapper' object has no attribute 'write_text'