Posts: 12
Threads: 1
Joined: Feb 2020
Feb-09-2020, 12:56 AM
(This post was last modified: Feb-09-2020, 12:56 AM by puredata.)
(Feb-08-2020, 11:28 AM)jim2007 Wrote: It is almost certainly an issue with passing the correct file name to the function. Try hard coding one of the file names into the my_zip variable and see if it runs without error.
Thank you, you're correct if I hard code one of the zipfiles into the my_zip variable script works. How do you think I can modify this script so it chooses files from a folder?
Edit: this one works as well if I hard code one of the zipfiles. I would like to iterate on every zipfile and subfolders in that zipfile in order to extract them all into one main folder.
import glob
import zipfile
import shutil
import os
my_dir = "/Users/myusernamehere/Documents/My_Dataset/new"
my_zip = r"/Users/myusernamehere/Documents/My_Dataset/something.zip"
with zipfile.ZipFile(my_zip) as zip:
for zip_info in zip.infolist():
if zip_info.filename[-1] == '/':
continue
zip_info.filename = os.path.basename(zip_info.filename)
zip.extract(zip_info, my_dir)
Posts: 24
Threads: 1
Joined: Jan 2020
Have a read of this, it should get you off the starting block in finding a list of files: https://realpython.com/working-with-files-in-python/
There is no passion to be found playing small - in settling for a life that is less than the one you are capable of living.
Posts: 12
Threads: 1
Joined: Feb 2020
Feb-10-2020, 03:45 PM
(This post was last modified: Feb-10-2020, 04:30 PM by puredata.)
(Feb-09-2020, 04:53 AM)jim2007 Wrote: Have a read of this, it should get you off the starting block in finding a list of files: https://realpython.com/working-with-files-in-python/
Thank you it was good read, I guess I don't have the experience or logic to read that kind of thing and apply to my case yet. Still waiting for a solution (lost hope already though)
Asked the same question on reddit, stackoverflow, python forum etc. Either the idea of helping others online is just vanished or helping others in programming is not similar to other practices. I have been a sound engineer for more than 13 years, I have spent countless hours helping newbies (including programmers) to solve their audio related problems. It's sad when you need the same help you can't find anyone to help.
I appreciate your help and interest again.
import os
'''
For the given path, get the List of all files in the directory tree
'''
def getListOfFiles(dirName):
# create a list of file and sub directories
# names in the given directory
listOfFile = os.listdir(dirName)
allFiles = list()
# Iterate over all the entries
for entry in listOfFile:
# Create full path
fullPath = os.path.join(dirName, entry)
# If entry is a directory then get the list of files in this directory
if os.path.isdir(fullPath):
allFiles = allFiles + getListOfFiles(fullPath)
else:
allFiles.append(fullPath)
return allFiles
def main():
dirName = '/home/varun/Downloads';
# Get the list of all files in directory tree at given path
listOfFiles = getListOfFiles(dirName)
# Print the files
for elem in listOfFiles:
print(elem)
print ("****************")
# Get the list of all files in directory tree at given path
listOfFiles = list()
for (dirpath, dirnames, filenames) in os.walk(dirName):
listOfFiles += [os.path.join(dirpath, file) for file in filenames]
# Print the files
for elem in listOfFiles:
print(elem)
if __name__ == '__main__':
main() I guess this script now prints out all the files in my directory, including subdirectories. Now I gotta find out how to extract them all and I should be all set.
Posts: 24
Threads: 1
Joined: Jan 2020
So I did the same, but it is a function that returns a list of files:
import os
from typing import List
FileList = List[str]
def get_list_of_files(base_folder: str, files: FileList) -> None:
with os.scandir(base_folder) as entries:
for entry in entries:
# Skip all hidden files and folders
if entry.name.startswith('.'):
continue
qualified_name = os.path.join(base_folder, entry)
# If it's a file we'll added it to the collection
if entry.is_file():
files.append(qualified_name)
elif entry.is_dir():
# If it's a folder then we'll process that as well
get_list_of_files(os.path.join(base_folder, entry), files)
files = []
base_folder = '/Users/<user>/Development'
get_list_of_files(base_folder, files)
for file in files:
print(file) You pass it the base folder and an empty list and it will populate it for you.
PS - I'm just learning Python as well, but I have 30+ years of coding behind me.
There is no passion to be found playing small - in settling for a life that is less than the one you are capable of living.
Posts: 12
Threads: 1
Joined: Feb 2020
Feb-10-2020, 07:45 PM
(This post was last modified: Feb-10-2020, 07:45 PM by puredata.)
(Feb-10-2020, 04:51 PM)jim2007 Wrote: So I did the same, but it is a function that returns a list of files"
Can't appreciate your time enough Jim, this lists all the files in my folder and I'm able to somewhat understand it! (except the fact that you use base_folder in the function but define it later on, I thought these codes were interpreted line by line) but that's something that I can research myself.
Now I'll go ahead and try to export these files that we printed into a main folder. I'll resurrect the thread if I get stuck again. :)
Thanks a bunch!
Edit: it didn't take long for me to come up with another question. Here is my final script;
import os, zipfile, shutil
from typing import List
FileList = List[str]
def get_list_of_files(base_folder: str, files: FileList) -> None:
with os.scandir(base_folder) as entries:
for entry in entries:
# Skip all hidden files and folders
if entry.name.startswith('.'):
continue
qualified_name = os.path.join(base_folder, entry)
# If it's a file we'll added it to the collection
if entry.is_file():
files.append(qualified_name)
elif entry.is_dir():
# If it's a folder then we'll process that as well
get_list_of_files(os.path.join(base_folder, entry), files)
files = []
base_folder = r"C:\Users\username\My_Dataset"
get_list_of_files(base_folder, files)
for file in files:
print(file)
my_zipfile = zipfile.ZipFile(file)
my_zipfile.extractall(r'C:\Users\username\My_Dataset\new')
# Generate the file paths to traverse, or a single path if a file name was given
def getfiles(path):
if os.path.isdir(path):
for root, dirs, files in os.walk(path):
for name in files:
yield os.path.join(root, name)
else:
yield path
destination = r"C:\Users\Documents\flatten"
fromdir = r"C:\Users\username\My_Dataset\new"
for f in getfiles(fromdir):
filename = str.split(f, '/')[-1]
if os.path.isfile(destination + filename):
filename = f.replace(fromdir, "", 1).replace("/", "_")
# os.rename(f, destination+filename)
shutil.copy2(f, r"C:\Users\username\Documents\flatten") Here is my question; what if I only want to process files that have specific extensions and leave the rest? I imagine I should build an if else function but I'm not sure exactly where to put it and how to construct it.
For example
if filetype is = shp, shx, cpg, prj, dbf
continue Would you have any input about this? Thanks a lot again in advance!
Posts: 2,125
Threads: 11
Joined: May 2017
Feb-10-2020, 08:13 PM
(This post was last modified: Feb-10-2020, 08:23 PM by DeaD_EyE.)
Here a more simple version, not tested much:
from pathlib import Path
from typing import Generator
FileList = Generator[Path, None, None]
def get_list_of_files(base_folder: Path) -> FileList:
for entry in base_folder.rglob('*'):
# Skip all hidden files and folders
if entry.name.startswith('.'):
continue
if entry.is_file():
yield entry
base_folder = Path.home() / 'Development'
files = list(get_list_of_files(base_folder))
for file in files:
print(file) Btw. the typehint stuff is not needed, but they could help an IDE to check for errors.
I changed the FileList, because it's now a generator, which yields Path objects.
You can make the homepath with Path.home() and join paths with / .
Makes it more readable. Also the recursion is not needed and can fail, if your directory tree is deeper than 1000.
The method rglob('*') does the job and find all files and directories recursive.
Now to you other problem. You can extend the function, to take a argument for extensions you want to process.
from pathlib import Path
my_extensions = ['.txt', '.odt', '.csv']
my_path = Path('/A/directory/somewhere/that/does/not.exist/file.txt')
print('The suffix is:', my_path.suffix)
if my_path.suffix not in my_extensions:
print(my_path, 'has not the allowed file extension') Instead of print something, you continue in the for-loop, to skip this element.
By the way, you could iterate over the generator. Before you provided a list as argument, which was modified inside the function. Now the function is turned into a generator (the yield keyword is inside). For each iteration the generator returns a path.
If the file-list is not needed, you could remove files = []
for file in get_list_of_files(base_folder):
print(file)
my_zipfile = zipfile.ZipFile(file)
my_zipfile.extractall(r'C:\Users\username\My_Dataset\new') Maybe the zipfile Module doesn't understand the Path object (haven't tested yet), but a path could converted into a string with str(file) .
Posts: 24
Threads: 1
Joined: Jan 2020
(Feb-10-2020, 07:45 PM)puredata Wrote: Would you have any input about this? Thanks a lot again in advance!
I would stick it into the same if block that handed files (if entry.is_file():)
There is no passion to be found playing small - in settling for a life that is less than the one you are capable of living.
Posts: 1
Threads: 0
Joined: Mar 2020
Mar-09-2020, 03:20 PM
(This post was last modified: Mar-09-2020, 03:26 PM by syssy.)
(Feb-06-2020, 09:58 PM)puredata Wrote: Hello, I'm trying to use Python to automate unzipping of multiple files in a folder. I already have a script that works and unzips my files. But it won't copy contents of the zip files which has subfolders. In order to be able to extract everything into one main folder (disregarding original subfolder structure) I have found this code chunk on stackexchange. A moderator reported that the code is working for him therefore I should be investigating something else, not the code itself.
import os
import shutil
import zipfile
my_dir = r"C:\Users\username\My_Dataset\new"
my_zip = r"C:\Users\username\My_Dataset"
with zipfile.ZipFile(my_zip) as zip_file:
for member in zip_file.namelist():
filename = os.path.basename(member)
# skip directories
if not filename:
continue
# copy file (taken from zipfile's extract)
source = zip_file.open(member)
target = open(os.path.join(my_dir, filename), "wb")
with source, target:
shutil.copyfileobj(source, target) Here is the Traceback I get:
Error: Traceback (most recent call last):
File "C:/Users/username/PycharmProjects/GeoPandas/mergeAnswer.py", line 8, in <module>
with zipfile.ZipFile(my_zip) as zip_file:
File "C:\Users\username\Anaconda3\envs\playground\lib\zipfile.py", line 1207, in __init__
self.fp = io.open(file, filemode)
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\username\\My_Dataset
The only thing that looks suspicious to me is the direction of the slashes on Traceback. Some of them are forwards slashes while some of them are backwards. Could it be related to that? I'm running this code in Windows and haven't tried on a mac or linux.
Would it be possible to get some suggestions in order to make this script work? I don't need to use this specific script, I'm just trying to unzip a lot of files (some are under subfolders some are not) into one main folder.
Thanks in advance!
Just copy the file from current folder and temporarily paste it in some other place on your computer.Now delete the original version and replace it it with the previously copied version. Hopefully this will help.
|