Mar-08-2025, 01:07 PM
If you run the program twice, then also the content of "ouput_file.htm" is included to "output_file.htm", but "output_file.htm" gets new content, because the output is read and written from the same file. I tried this on a NVMe and file size was growing very fast. I deleted afterward 4 GiB data because of this silly mistake.
Example with Path objects
Example with Path objects
from pathlib import Path from shutil import copyfileobj def merge(path: str|Path, glob: str, output: str|Path, show:bool=False) -> None: """ Merge files found in path by glob pattern. All data is written to output. Args: path (str | Path): Path to find files glob (str): glob pattern to find files in path output (str | Path): Output file show (bool, optional): Print processed file. Defaults to False. """ # Ensure that output is a Path object output = Path(output) # excluding output file from the list of files files = [file for file in Path(path).glob(glob) if file != output] # keep in mind, that the order of files is not given # sorting files by modification time # but I guess it's not what you want files.sort(key=lambda file: file.stat().st_mtime) if not files: return with open(output, "wb") as fd_out: for file in files: if show: print(f"Processing {file}") with file.open("rb") as fd_in: copyfileobj(fd_in, fd_out) merge(".", "*.txt", "output_file.txt", show=True)
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
All humans together. We don't need politicians!