Python Forum
Decompressing bz2 in multiple sub-directories - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Decompressing bz2 in multiple sub-directories (/thread-2610.html)

Pages: 1 2 3


Decompressing bz2 in multiple sub-directories - kiton - Mar-28-2017

Hi there! My root folder contains multiple sub-directories each of which contains multiple *.json.bz2 files. My goal is to decompress the bz2 files and place them in the same sub-directories where they are. Using examples found online, I am trying to run the following code (please note two additional questions mentioned there):

import sys
import os
import bz2
from bz2 import decompress

path = '/Users/mymac/Documents/Jupyter/Twitter/05'
for subdir, dirs, files in os.walk(path):
    for filename in files:
        filepath = os.path.join(dirpath, filename)
        newfilepath = os.path.join(dirpath, filename[:-4]) # Remove ".bz2" in the end
        with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
            for data in iter(lambda : file.read(100 * 1024), b''):
                new_file.write(data)
#Question: how do I delete the the "*.json.bz2" after they are deompressed?
However, it returns the following error message:
---------------------------------------------------------------------------
IsADirectoryError                         Traceback (most recent call last)
<ipython-input-20-6e967e3c97bb> in <module>()
     9         filepath = os.path.join(dirpath, filename)
    10         newfilepath = os.path.join(dirpath, filename[:-4])
---> 11         with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
    12             for data in iter(lambda : file.read(100 * 1024), b''):
    13                 new_file.write(data)

IsADirectoryError: [Errno 21] Is a directory: '/Users/mymac/Documents/Jupyter/Twitter/05/.'


Please advise what I am doing wrong here. Thank you in advance!


RE: Decompressing bz2 in multiple sub-directories - nilamo - Mar-28-2017

For #1, just slice it off:  
>>> fname = "something.json.bz2"
>>> ".bz2" == fname[-4:]
True
>>> fname[:-4]
'something.json'



RE: Decompressing bz2 in multiple sub-directories - kiton - Mar-28-2017

Thank you for reply, nilamo. I got that part. However, something is still wrong and I cannot figure it out. Consider an updated code and error message in the first post.


RE: Decompressing bz2 in multiple sub-directories - nilamo - Mar-28-2017

You should print(filepath) and newfilepath, since one of them is a directory, and not a file. Once you know what's happening, you'll be able to fix it.


RE: Decompressing bz2 in multiple sub-directories - wavic - Mar-28-2017

Do not replace the contend of your original post, please. Post another instead.

You are trying to open a directory for reading or writing, not a file. Check 'filepath' and 'newpath'.


RE: Decompressing bz2 in multiple sub-directories - zivoni - Mar-28-2017

Just a little notice: you could check that your files end really with .bz2 (something like if filename.endswith('.bz2'):).


RE: Decompressing bz2 in multiple sub-directories - kiton - Mar-28-2017

nilamo, thank you for your suggestion.
wavic, I got your point and will keep it in mind in future (sorry).
zivoni, I appreciate your comment.

for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/00/05/'):
    for filename in files:
        if filename.endswith('.json.bz2'):
            filepath = os.path.join(dirpath, filename)
            newfilepath = os.path.join(dirpath, filename[:-4])
            print(filepath)
            print(newfilepath)
            with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
                for data in iter(lambda : file.read(100 * 1024), b''):
                    new_file.write(data)
Thank you, guys :)


RE: Decompressing bz2 in multiple sub-directories - nilamo - Mar-29-2017

So... does it work now?


RE: Decompressing bz2 in multiple sub-directories - kiton - Mar-29-2017

(Mar-29-2017, 03:58 PM)nilamo Wrote: So... does it work now?

Thanks for the follow-up. So, the following code does work fine. Yet, I have a question: how should I modify newfilepath so that each newly created (decompressed) file (in either of the sub-directories) has a unique new name (e.g., 1.json, 2.json, ... , n.json)? Thanks in advance for help.

for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
    for filename in files:
        if filename.endswith('.json.bz2'):
            filepath = os.path.join(dirpath, filename)
            newfilepath = os.path.join(dirpath, filename[:-4])
            print(newfilepath)
            with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
                for data in iter(lambda : file.read(100 * 1024), b''):
                    new_file.write(data)



RE: Decompressing bz2 in multiple sub-directories - nilamo - Mar-29-2017

You could .enumerate() os.walk() and files, but that might just make your code hard to read (and your counter would skip a few, so you'd have 1.json, then a 3.json without a 2).  Here's how I'd do it, nice and simple:

for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
   file_counter = 0
   for filename in files:
       if filename.endswith('.json.bz2'):
           filepath = os.path.join(dirpath, filename)
           file_counter += 1
           newfilepath = os.path.join(dirpath, "{0}.json".format(file_counter))