Decompressing bz2 in multiple sub-directories

***zivoni*** · (This post was last modified: Mar-29-2017, 05:00 PM by zivoni.)

You can use auxiliary variable, initialize it to 1 before your first "files" for statement and use it to create filename and increase after use.

for dirpath, dirname, ...
    file_number = 1
    for filename in ..
      ....

      new_filename = "{:03d}.json".format(file_number)  # you can use just str() + ".json", but format adds more options
      file_number += 1
      newfilepath = os.path.join(dirpath, new_filename)
      ...

If you have some old files with same name, it will overwrite them, so be careful. And with "numerical" file names its sometimes useful to zero-fill number, so they are sorted naturally (no 1.json, 11.json, ..19.json, 2.json, but 001.json, 002.json...).

kiton · (This post was last modified: Mar-29-2017, 05:43 PM by kiton.)

nilamo, zivoni, thank you very much for suggestions -- I tried them both and they work fine creating relevant numbering. However, note that with each new directory the counter starts from zero again -- what is the appropriate way to make it go on and on till the last file (e.g., 99999.json)?

print(newfilepath) # example utilizing zivoni's code (nilamo's works in a similar fashion)

/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/13/058.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/13/059.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/13/060.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/14/001.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/14/002.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/14/003.json

**nilamo** · Mar-29-2017, 05:53 PM

Instead of setting the file_counter (or whatever) inside the for loop, set it once before either loop starts.

wavic · Mar-29-2017, 06:02 PM

for file_number, file_name in enumerate(files, 1):

    new_filename = "{:03d}.json".format(file_number)

if you want file names to start from 1. If not: enumerate(files)

**nilamo** · Mar-29-2017, 06:33 PM

But then it's re-starting for every directory. OP said they want to maintain counts across directories.

kiton · (This post was last modified: Mar-29-2017, 07:05 PM by kiton.)

Yep, I keep trying different variations of the code, but it always keeps restarting in each new directory. I'd like there to be no "duplicate" names, but subsequent numbers, if possible.

Code variations tried:

# nilamo's version
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
    file_counter = 0
    for filename in files:
        file_counter += 1 # tried other placements, too
        if filename.endswith('.json.bz2'):
            filepath = os.path.join(dirpath, filename)
            newfilepath = os.path.join(dirpath, "{0}.json".format(file_counter))
            print(newfilepath)
            with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
                for data in iter(lambda : file.read(100 * 1024), b''):
                    new_file.write(data)

and

# zivoni's version
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
    file_number = 1
    for filename in files:
        if filename.endswith('.json.bz2'):
            filepath = os.path.join(dirpath, filename)
            file_number += 1 # tried other placements, too
            for file_number, file_name in enumerate(files, 1):
                new_filename = "{:03d}.json".format(file_number)  # you can use just str() + ".json", but format adds more options
                newfilepath = os.path.join(dirpath, new_filename)
                with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
                    for data in iter(lambda : file.read(100 * 1024), b''):
                        new_file.write(data)

wavic · (This post was last modified: Mar-29-2017, 07:29 PM by wavic.)

'counter' should be outside of the os.walk() loop

file_number = 1
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
    for filename in files:
        if filename.endswith('.json.bz2'):
            filepath = os.path.join(dirpath, filename)
            for file_name in files:
                new_filename = "{:03d}.json".format(file_number)  # you can use just str() + ".json", but format adds more options
                newfilepath = os.path.join(dirpath, new_filename)
                with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
                    for data in iter(lambda : file.read(100 * 1024), b''):
                        new_file.write(data)
                        file_number += 1 # tried other placements, too

**nilamo** · Mar-29-2017, 07:34 PM

Right, the +1 was fine in both places, what you wanted to change was it getting set to 0 every directory. So, instead of setting it to 0 every directory, just do it once, before any of the for loops.

***zivoni*** · Mar-29-2017, 09:22 PM

Yes, as wavic posted. There is a misindent in his post on last line (file_number += 1), it should be on same level as newfilepath = ...
(now it increases per 100kB read).

kiton · Mar-29-2017, 11:35 PM

I'd like to thank everybody for being so supportive and helpful. The task has been accomplished with the following code:

file_counter = 0
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
   for filename in files:
       file_counter += 1
       if filename.endswith('.json.bz2'):
           filepath = os.path.join(dirpath, filename)
           newfilepath = os.path.join(dirpath, "{0}.json".format(file_counter))
           print(newfilepath)
           with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
               for data in iter(lambda : file.read(100 * 1024), b''):
                   new_file.write(data)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Organization of project directories	wotoko	3	435	Mar-02-2024, 03:34 PM Last Post: Larz60+
	Listing directories (as a text file)	kiwi99	1	844	Feb-17-2023, 12:58 PM Last Post: Larz60+
	Find duplicate files in multiple directories	Pavel_47	9	3,141	Dec-27-2022, 04:47 PM Last Post: deanhystad
	rename same file names in different directories	elnk	0	719	Nov-04-2022, 05:23 PM Last Post: elnk
	I need to copy all the directories that do not match the pattern	tester_V	7	2,447	Feb-04-2022, 06:26 PM Last Post: tester_V
	Functions to consider for file renaming and moving around directories	cubangt	2	1,766	Jan-07-2022, 02:16 PM Last Post: cubangt
	Moving specific files then unzipping/decompressing	christophereccles	2	2,375	Apr-24-2021, 04:25 AM Last Post: ndc85430
	Python create directories within directories	mcesmcsc	2	2,223	Dec-17-2019, 12:32 PM Last Post: mcesmcsc
	Shutil attempts to copy directories that don't exist	ConsoleGeek	5	4,565	Oct-29-2019, 09:26 PM Last Post: Gribouillis
	How to combine file names into a list from multiple directories?	python_newbie09	3	5,223	Jul-09-2019, 07:38 PM Last Post: python_newbie09

Decompressing bz2 in multiple sub-directories

User Panel Messages

Announcements