Python Forum
Decompressing bz2 in multiple sub-directories
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Decompressing bz2 in multiple sub-directories
#1
Hi there! My root folder contains multiple sub-directories each of which contains multiple *.json.bz2 files. My goal is to decompress the bz2 files and place them in the same sub-directories where they are. Using examples found online, I am trying to run the following code (please note two additional questions mentioned there):

import sys
import os
import bz2
from bz2 import decompress

path = '/Users/mymac/Documents/Jupyter/Twitter/05'
for subdir, dirs, files in os.walk(path):
    for filename in files:
        filepath = os.path.join(dirpath, filename)
        newfilepath = os.path.join(dirpath, filename[:-4]) # Remove ".bz2" in the end
        with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
            for data in iter(lambda : file.read(100 * 1024), b''):
                new_file.write(data)
#Question: how do I delete the the "*.json.bz2" after they are deompressed?
However, it returns the following error message:
---------------------------------------------------------------------------
IsADirectoryError                         Traceback (most recent call last)
<ipython-input-20-6e967e3c97bb> in <module>()
     9         filepath = os.path.join(dirpath, filename)
    10         newfilepath = os.path.join(dirpath, filename[:-4])
---> 11         with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
    12             for data in iter(lambda : file.read(100 * 1024), b''):
    13                 new_file.write(data)

IsADirectoryError: [Errno 21] Is a directory: '/Users/mymac/Documents/Jupyter/Twitter/05/.'


Please advise what I am doing wrong here. Thank you in advance!
Reply
#2
For #1, just slice it off:  
>>> fname = "something.json.bz2"
>>> ".bz2" == fname[-4:]
True
>>> fname[:-4]
'something.json'
Reply
#3
Thank you for reply, nilamo. I got that part. However, something is still wrong and I cannot figure it out. Consider an updated code and error message in the first post.
Reply
#4
You should print(filepath) and newfilepath, since one of them is a directory, and not a file. Once you know what's happening, you'll be able to fix it.
Reply
#5
Do not replace the contend of your original post, please. Post another instead.

You are trying to open a directory for reading or writing, not a file. Check 'filepath' and 'newpath'.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#6
Just a little notice: you could check that your files end really with .bz2 (something like if filename.endswith('.bz2'):).
Reply
#7
nilamo, thank you for your suggestion.
wavic, I got your point and will keep it in mind in future (sorry).
zivoni, I appreciate your comment.

for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/00/05/'):
    for filename in files:
        if filename.endswith('.json.bz2'):
            filepath = os.path.join(dirpath, filename)
            newfilepath = os.path.join(dirpath, filename[:-4])
            print(filepath)
            print(newfilepath)
            with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
                for data in iter(lambda : file.read(100 * 1024), b''):
                    new_file.write(data)
Thank you, guys :)
Reply
#8
So... does it work now?
Reply
#9
(Mar-29-2017, 03:58 PM)nilamo Wrote: So... does it work now?

Thanks for the follow-up. So, the following code does work fine. Yet, I have a question: how should I modify newfilepath so that each newly created (decompressed) file (in either of the sub-directories) has a unique new name (e.g., 1.json, 2.json, ... , n.json)? Thanks in advance for help.

for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
    for filename in files:
        if filename.endswith('.json.bz2'):
            filepath = os.path.join(dirpath, filename)
            newfilepath = os.path.join(dirpath, filename[:-4])
            print(newfilepath)
            with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
                for data in iter(lambda : file.read(100 * 1024), b''):
                    new_file.write(data)
Reply
#10
You could .enumerate() os.walk() and files, but that might just make your code hard to read (and your counter would skip a few, so you'd have 1.json, then a 3.json without a 2).  Here's how I'd do it, nice and simple:

for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
   file_counter = 0
   for filename in files:
       if filename.endswith('.json.bz2'):
           filepath = os.path.join(dirpath, filename)
           file_counter += 1
           newfilepath = os.path.join(dirpath, "{0}.json".format(file_counter))
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Organization of project directories wotoko 3 440 Mar-02-2024, 03:34 PM
Last Post: Larz60+
  Listing directories (as a text file) kiwi99 1 846 Feb-17-2023, 12:58 PM
Last Post: Larz60+
  Find duplicate files in multiple directories Pavel_47 9 3,144 Dec-27-2022, 04:47 PM
Last Post: deanhystad
  rename same file names in different directories elnk 0 719 Nov-04-2022, 05:23 PM
Last Post: elnk
  I need to copy all the directories that do not match the pattern tester_V 7 2,448 Feb-04-2022, 06:26 PM
Last Post: tester_V
  Functions to consider for file renaming and moving around directories cubangt 2 1,770 Jan-07-2022, 02:16 PM
Last Post: cubangt
  Moving specific files then unzipping/decompressing christophereccles 2 2,377 Apr-24-2021, 04:25 AM
Last Post: ndc85430
  Python create directories within directories mcesmcsc 2 2,225 Dec-17-2019, 12:32 PM
Last Post: mcesmcsc
  Shutil attempts to copy directories that don't exist ConsoleGeek 5 4,567 Oct-29-2019, 09:26 PM
Last Post: Gribouillis
  How to combine file names into a list from multiple directories? python_newbie09 3 5,223 Jul-09-2019, 07:38 PM
Last Post: python_newbie09

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020