Decompressing bz2 in multiple sub-directories - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Decompressing bz2 in multiple sub-directories (/thread-2610.html) |
Decompressing bz2 in multiple sub-directories - kiton - Mar-28-2017 Hi there! My root folder contains multiple sub-directories each of which contains multiple *.json.bz2 files. My goal is to decompress the bz2 files and place them in the same sub-directories where they are. Using examples found online, I am trying to run the following code (please note two additional questions mentioned there): import sys import os import bz2 from bz2 import decompress path = '/Users/mymac/Documents/Jupyter/Twitter/05' for subdir, dirs, files in os.walk(path): for filename in files: filepath = os.path.join(dirpath, filename) newfilepath = os.path.join(dirpath, filename[:-4]) # Remove ".bz2" in the end with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file: for data in iter(lambda : file.read(100 * 1024), b''): new_file.write(data) #Question: how do I delete the the "*.json.bz2" after they are deompressed?However, it returns the following error message: --------------------------------------------------------------------------- IsADirectoryError Traceback (most recent call last) <ipython-input-20-6e967e3c97bb> in <module>() 9 filepath = os.path.join(dirpath, filename) 10 newfilepath = os.path.join(dirpath, filename[:-4]) ---> 11 with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file: 12 for data in iter(lambda : file.read(100 * 1024), b''): 13 new_file.write(data) IsADirectoryError: [Errno 21] Is a directory: '/Users/mymac/Documents/Jupyter/Twitter/05/.' Please advise what I am doing wrong here. Thank you in advance! RE: Decompressing bz2 in multiple sub-directories - nilamo - Mar-28-2017 For #1, just slice it off: >>> fname = "something.json.bz2" >>> ".bz2" == fname[-4:] True >>> fname[:-4] 'something.json' RE: Decompressing bz2 in multiple sub-directories - kiton - Mar-28-2017 Thank you for reply, nilamo. I got that part. However, something is still wrong and I cannot figure it out. Consider an updated code and error message in the first post. RE: Decompressing bz2 in multiple sub-directories - nilamo - Mar-28-2017 You should print(filepath) and newfilepath, since one of them is a directory, and not a file. Once you know what's happening, you'll be able to fix it. RE: Decompressing bz2 in multiple sub-directories - wavic - Mar-28-2017 Do not replace the contend of your original post, please. Post another instead. You are trying to open a directory for reading or writing, not a file. Check 'filepath' and 'newpath'. RE: Decompressing bz2 in multiple sub-directories - zivoni - Mar-28-2017 Just a little notice: you could check that your files end really with .bz2 (something like if filename.endswith('.bz2'):). RE: Decompressing bz2 in multiple sub-directories - kiton - Mar-28-2017 nilamo, thank you for your suggestion. wavic, I got your point and will keep it in mind in future (sorry). zivoni, I appreciate your comment. for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/00/05/'): for filename in files: if filename.endswith('.json.bz2'): filepath = os.path.join(dirpath, filename) newfilepath = os.path.join(dirpath, filename[:-4]) print(filepath) print(newfilepath) with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file: for data in iter(lambda : file.read(100 * 1024), b''): new_file.write(data)Thank you, guys :) RE: Decompressing bz2 in multiple sub-directories - nilamo - Mar-29-2017 So... does it work now? RE: Decompressing bz2 in multiple sub-directories - kiton - Mar-29-2017 (Mar-29-2017, 03:58 PM)nilamo Wrote: So... does it work now? Thanks for the follow-up. So, the following code does work fine. Yet, I have a question: how should I modify newfilepath so that each newly created (decompressed) file (in either of the sub-directories) has a unique new name (e.g., 1.json, 2.json, ... , n.json)? Thanks in advance for help. for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'): for filename in files: if filename.endswith('.json.bz2'): filepath = os.path.join(dirpath, filename) newfilepath = os.path.join(dirpath, filename[:-4]) print(newfilepath) with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file: for data in iter(lambda : file.read(100 * 1024), b''): new_file.write(data) RE: Decompressing bz2 in multiple sub-directories - nilamo - Mar-29-2017 You could .enumerate() os.walk() and files, but that might just make your code hard to read (and your counter would skip a few, so you'd have 1.json, then a 3.json without a 2). Here's how I'd do it, nice and simple: for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'): file_counter = 0 for filename in files: if filename.endswith('.json.bz2'): filepath = os.path.join(dirpath, filename) file_counter += 1 newfilepath = os.path.join(dirpath, "{0}.json".format(file_counter)) |