Posts: 56
Threads: 23
Joined: Jul 2021
Jul-09-2022, 12:08 PM
(This post was last modified: Jul-09-2022, 02:11 PM by AlphaInc.)
Hello everybody,
I have a folder with many text-files, some of them are emtpy but have mutliple lines (for example, someone edited an empty text file, pressed enter and saved the file). I wanted to create a script where it checks every file in a specified path and deletes all those files. I've manged to delete really empty files but haven't gotten to the point, where those multiple-blank-line files have been deleted. This is my code so far:
#!/usr/bin/env python3
#Imports
import sys
import os
#Folder configuration
def remove_empty(path):
print(list(os.walk(path)))
for (dirpath, folder_names, files) in os.walk(path):
for filename in files:
file_location = dirpath + '/' + filename
if os.path.isfile(file_location):
if os.path.getsize(file_location) == 0:
os.remove(file_location)
#Deletion
if __name__ == "__main__":
path = 'C:\Path\to\folder\'
remove_empty(path)
#End
sys.exit()
Posts: 4,686
Threads: 73
Joined: Jan 2018
Jul-09-2022, 01:26 PM
(This post was last modified: Jul-09-2022, 01:26 PM by Gribouillis.)
You could use a function such as
def is_blank_file(filename, chunk_size=1024):
with open(filename, 'rb') as fh:
while True:
b = fh.read(chunk_size)
if not b:
return True
if b.strip():
return False This returns True if the file contains only ascii white space characters and False if the file contains a character not in this set.
Posts: 2,071
Threads: 9
Joined: May 2017
Jul-09-2022, 01:36 PM
(This post was last modified: Jul-09-2022, 01:36 PM by DeaD_EyE.)
Use pathlib.Path for better abstraction.
This could not work, because the quote is escaped by the last \ .
path = 'C:\Path\to\folder\' Better:
path = r'C:\Path\to\folder' raw-string, so escape don't work, only the quote ' / " could be escaped.
sys.exit is not required at the end, because if there is only the Main-Thread and there are no further statements and no loop, the program ends.
To know if a file has only white spaces inside, you have to read the whole file. This is a big problem, if there are big files. So you need to have a limit for this. Then you could read the files, which are small enough, stripping them and when the result is an empty str/bytes, then delete the file.
The annotations are not mandatory, but they give the programmer an Idea which types should be used.
There is no control, if the right types are used. This is for example if you use an IDE and you want to give the
IDE better hints. So, you'll use this in future at some point.
Code:
#!/usr/bin/env python3
from __future__ import annotations
from pathlib import Path
ONE_MIB = 1048576
def remove_empty(root: str | Path, max_size: int = ONE_MIB) -> None:
"""
Removes all files, which are 0 bytes big
Removes all files, which are smaller or equal than max_size and have only whitespace inside.
This prevents striping files which are too big (takes time to read into memory and stripping)
"""
root = Path(root)
for path in root.rglob("*"):
if not path.is_file():
# we want only real files
# path could also be a directory
continue
size = path.stat().st_size
# if size == 0, unlink
# or if size is smaller as max_size, then read the whole content of file into
# memory, stripping whitespace. Empty bytes/str is False
if size == 0 or size <= max_size and not path.read_bytes().strip():
path.unlink()
if __name__ == "__main__":
path = r"C:\Path\to\folder"
# path had a backslash at the end, which excapes the qoute.
# This is also for raw-strings True
remove_empty(path) I like it to use double quotes, because I can see them better.
Posts: 4,686
Threads: 73
Joined: Jan 2018
(Jul-09-2022, 01:34 PM)DeaD_EyE Wrote: To know if a file has only white spaces inside, you have to read the whole file. This is a big problem, if there are big files. This is not true if you read the file by chunks of say 1024 bytes. If the file contains non blank characters, it is highly probable that some of these non blank characters will appear very soon in the file. The event that you will read many large files containing only blank characters is so improbable that you can neglect this issue.
Posts: 56
Threads: 23
Joined: Jul 2021
(Jul-09-2022, 01:36 PM)DeaD_EyE Wrote: Use pathlib.Path for better abstraction.
This could not work, because the quote is escaped by the last \ .
path = 'C:\Path\to\folder\' Better:
path = r'C:\Path\to\folder' raw-string, so escape don't work, only the quote ' /" could be escaped.
sys.exit is not required at the end, because if there is only the Main-Thread and there are no further statements and no loop, the program ends.
To know if a file has only white spaces inside, you have to read the whole file. This is a big problem, if there are big files. So you need to have a limit for this. Then you could read the files, which are small enough, stripping them and when the result is an empty str/bytes, then delete the file.
The annotations are not mandatory, but they give the programmer an Idea which types should be used.
There is no control, if the right types are used. This is for example if you use an IDE and you want to give the
IDE better hints. So, you'll use this in future at some point.
Code:
#!/usr/bin/env python3
from __future__ import annotations
from pathlib import Path
ONE_MIB = 1048576
def remove_empty(root: str | Path, max_size: int = ONE_MIB) -> None:
"""
Removes all files, which are 0 bytes big
Removes all files, which are smaller or equal than max_size and have only whitespace inside.
This prevents striping files which are too big (takes time to read into memory and stripping)
"""
root = Path(root)
for path in root.rglob("*"):
if not path.is_file():
# we want only real files
# path could also be a directory
continue
size = path.stat().st_size
# if size == 0, unlink
# or if size is smaller as max_size, then read the whole content of file into
# memory, stripping whitespace. Empty bytes/str is False
if size == 0 or size <= max_size and not path.read_bytes().strip():
path.unlink()
if __name__ == "__main__":
path = r"C:\Path\to\folder"
# path had a backslash at the end, which excapes the qoute.
# This is also for raw-strings True
remove_empty(path) I like it to use double quotes, because I can see them better.
That worked fine, thank you.
Posts: 2,071
Threads: 9
Joined: May 2017
(Jul-09-2022, 01:38 PM)Gribouillis Wrote: This is not true if you read the file by chunks of say 1024 bytes. If the file contains non blank characters, it is highly probable that some of these non blank characters will appear very soon in the file.
In most cases this should be True, but I constructed a case, where it's not True.
I used it once, to simplify the usage of the esp flash tool.
So instead of telling the flash-tool to write at a 4k offset, I used a binary blob where the first 4k were zero-bytes.
So, address start is 0 instead of 0x1000 .
I haven't seen normal textiles, where the first 1024 bytes a whitespace.
If you want to be really safe, then read all chunks.
def is_blank_file(filename, chunk_size=1024):
with open(filename, 'rb') as fh:
while chunk := fh.read(chunk_size):
if chunk.strip():
# early non-whitespace -> fast
return False
# no non-whitespace detected and reaching this point, takes more time
# as the opposite
return True
Gribouillis likes this post
|