Python Forum
Delete empty text files [SOLVED]
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Delete empty text files [SOLVED]
#1
Hello everybody,

I have a folder with many text-files, some of them are emtpy but have mutliple lines (for example, someone edited an empty text file, pressed enter and saved the file). I wanted to create a script where it checks every file in a specified path and deletes all those files. I've manged to delete really empty files but haven't gotten to the point, where those multiple-blank-line files have been deleted. This is my code so far:

#!/usr/bin/env python3

#Imports
import sys
import os

#Folder configuration
def remove_empty(path):
    print(list(os.walk(path)))
    for (dirpath, folder_names, files) in os.walk(path):
        for filename in files:
            file_location = dirpath + '/' + filename
            if os.path.isfile(file_location):
                if os.path.getsize(file_location) == 0:
                    os.remove(file_location)

#Deletion
if __name__ == "__main__":
    path = 'C:\Path\to\folder\'
    remove_empty(path)
    
#End
sys.exit()
Reply
#2
You could use a function such as
def is_blank_file(filename, chunk_size=1024):
    with open(filename, 'rb') as fh:
        while True:
            b = fh.read(chunk_size)
            if not b:
                return True
            if b.strip():
                return False
This returns True if the file contains only ascii white space characters and False if the file contains a character not in this set.
Reply
#3
Use pathlib.Path for better abstraction.


This could not work, because the quote is escaped by the last \.
path = 'C:\Path\to\folder\'
Better:
path = r'C:\Path\to\folder'
raw-string, so escape don't work, only the quote '/" could be escaped.

sys.exit is not required at the end, because if there is only the Main-Thread and there are no further statements and no loop, the program ends.

To know if a file has only white spaces inside, you have to read the whole file. This is a big problem, if there are big files. So you need to have a limit for this. Then you could read the files, which are small enough, stripping them and when the result is an empty str/bytes, then delete the file.

The annotations are not mandatory, but they give the programmer an Idea which types should be used.
There is no control, if the right types are used. This is for example if you use an IDE and you want to give the
IDE better hints. So, you'll use this in future at some point.
Code:
#!/usr/bin/env python3
from __future__ import annotations

from pathlib import Path

ONE_MIB = 1048576 

def remove_empty(root: str | Path, max_size: int = ONE_MIB) -> None:
    """
    Removes all files, which are 0 bytes big
    Removes all files, which are smaller or equal than max_size and have only whitespace inside.
    This prevents striping files which are too big (takes time to read into memory and stripping)
    """
    root = Path(root)
    for path in root.rglob("*"):
        if not path.is_file():
            # we want only real files
            # path could also be a directory
            continue

        size = path.stat().st_size
        # if size == 0, unlink
        # or if size is smaller as max_size, then read the whole content of file into
        # memory, stripping whitespace. Empty bytes/str is False

        if size == 0 or size <= max_size and not path.read_bytes().strip():
            path.unlink()


if __name__ == "__main__":
    path = r"C:\Path\to\folder"
    # path had a backslash at the end, which excapes the qoute.
    # This is also for raw-strings True
    remove_empty(path)
I like it to use double quotes, because I can see them better.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#4
(Jul-09-2022, 01:34 PM)DeaD_EyE Wrote: To know if a file has only white spaces inside, you have to read the whole file. This is a big problem, if there are big files.
This is not true if you read the file by chunks of say 1024 bytes. If the file contains non blank characters, it is highly probable that some of these non blank characters will appear very soon in the file. The event that you will read many large files containing only blank characters is so improbable that you can neglect this issue.
DeaD_EyE likes this post
Reply
#5
(Jul-09-2022, 01:36 PM)DeaD_EyE Wrote: Use pathlib.Path for better abstraction.


This could not work, because the quote is escaped by the last \.
path = 'C:\Path\to\folder\'
Better:
path = r'C:\Path\to\folder'
raw-string, so escape don't work, only the quote '/" could be escaped.

sys.exit is not required at the end, because if there is only the Main-Thread and there are no further statements and no loop, the program ends.

To know if a file has only white spaces inside, you have to read the whole file. This is a big problem, if there are big files. So you need to have a limit for this. Then you could read the files, which are small enough, stripping them and when the result is an empty str/bytes, then delete the file.

The annotations are not mandatory, but they give the programmer an Idea which types should be used.
There is no control, if the right types are used. This is for example if you use an IDE and you want to give the
IDE better hints. So, you'll use this in future at some point.
Code:
#!/usr/bin/env python3
from __future__ import annotations

from pathlib import Path

ONE_MIB = 1048576 

def remove_empty(root: str | Path, max_size: int = ONE_MIB) -> None:
    """
    Removes all files, which are 0 bytes big
    Removes all files, which are smaller or equal than max_size and have only whitespace inside.
    This prevents striping files which are too big (takes time to read into memory and stripping)
    """
    root = Path(root)
    for path in root.rglob("*"):
        if not path.is_file():
            # we want only real files
            # path could also be a directory
            continue

        size = path.stat().st_size
        # if size == 0, unlink
        # or if size is smaller as max_size, then read the whole content of file into
        # memory, stripping whitespace. Empty bytes/str is False

        if size == 0 or size <= max_size and not path.read_bytes().strip():
            path.unlink()


if __name__ == "__main__":
    path = r"C:\Path\to\folder"
    # path had a backslash at the end, which excapes the qoute.
    # This is also for raw-strings True
    remove_empty(path)
I like it to use double quotes, because I can see them better.

That worked fine, thank you.
Reply
#6
(Jul-09-2022, 01:38 PM)Gribouillis Wrote: This is not true if you read the file by chunks of say 1024 bytes. If the file contains non blank characters, it is highly probable that some of these non blank characters will appear very soon in the file.

In most cases this should be True, but I constructed a case, where it's not True.

I used it once, to simplify the usage of the esp flash tool.
So instead of telling the flash-tool to write at a 4k offset, I used a binary blob where the first 4k were zero-bytes.
So, address start is 0 instead of 0x1000.


I haven't seen normal textiles, where the first 1024 bytes a whitespace.
If you want to be really safe, then read all chunks.
def is_blank_file(filename, chunk_size=1024):
    with open(filename, 'rb') as fh:
        while chunk := fh.read(chunk_size):
            if chunk.strip():
                # early non-whitespace -> fast
                return False

    # no non-whitespace detected and reaching this point, takes more time
    # as the opposite
    return True
Gribouillis likes this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question [solved] compressing files with python. SpongeB0B 1 650 May-26-2023, 03:33 PM
Last Post: SpongeB0B
  Help replacing word in Mutiple files. (SOLVED) mm309d 0 841 Mar-21-2023, 03:43 AM
Last Post: mm309d
  [SOLVED] [sqilte3] Check if column not empty? Winfried 5 1,125 Jan-28-2023, 12:53 PM
Last Post: Winfried
  azure TTS from text files to mp3s mutantGOD 2 1,706 Jan-17-2023, 03:20 AM
Last Post: mutantGOD
  delete all files and subdirectory from a main folder mg24 7 1,606 Oct-28-2022, 07:55 AM
Last Post: ibreeden
  delete all files which contains word sql_Table1 mg24 2 870 Sep-15-2022, 10:05 PM
Last Post: mg24
  [SOLVED] [BeautifulSoup] How to get this text? Winfried 6 1,991 Aug-17-2022, 03:58 PM
Last Post: Winfried
  Writing into 2 text files from the same function paul18fr 4 1,682 Jul-28-2022, 04:34 AM
Last Post: ndc85430
  select files such as text file RolanRoll 2 1,175 Jun-25-2022, 08:07 PM
Last Post: RolanRoll
  Two text files, want to add a column value zxcv101 8 1,935 Jun-20-2022, 03:06 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020