Python Forum
Decompressing bz2 in multiple sub-directories
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Decompressing bz2 in multiple sub-directories
#11
You can use auxiliary variable, initialize it to 1 before your first "files" for statement and use it to create filename and increase after use.

for dirpath, dirname, ...
    file_number = 1
    for filename in ..
      ....

      new_filename = "{:03d}.json".format(file_number)  # you can use just str() + ".json", but format adds more options
      file_number += 1
      newfilepath = os.path.join(dirpath, new_filename)
      ...
If you have some old files with same name, it will overwrite them, so be careful. And with "numerical" file names its sometimes useful to zero-fill number, so they are sorted naturally (no 1.json, 11.json, ..19.json, 2.json, but 001.json, 002.json...).
Reply
#12
nilamo, zivoni, thank you very much for suggestions -- I tried them both and they work fine creating relevant numbering. However, note that with each new directory the counter starts from zero again -- what is the appropriate way to make it go on and on till the last file (e.g., 99999.json)?

print(newfilepath) # example utilizing zivoni's code (nilamo's works in a similar fashion)

/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/13/058.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/13/059.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/13/060.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/14/001.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/14/002.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/14/003.json
Reply
#13
Instead of setting the file_counter (or whatever) inside the for loop, set it once before either loop starts.
Reply
#14
for file_number, file_name in enumerate(files, 1):

    new_filename = "{:03d}.json".format(file_number)
    
if you want file names to start from 1. If not: enumerate(files)
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#15
But then it's re-starting for every directory. OP said they want to maintain counts across directories.
Reply
#16
Yep, I keep trying different variations of the code, but it always keeps restarting in each new directory. I'd like there to be no "duplicate" names, but subsequent numbers, if possible.

Code variations tried:
# nilamo's version
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
    file_counter = 0
    for filename in files:
        file_counter += 1 # tried other placements, too
        if filename.endswith('.json.bz2'):
            filepath = os.path.join(dirpath, filename)
            newfilepath = os.path.join(dirpath, "{0}.json".format(file_counter))
            print(newfilepath)
            with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
                for data in iter(lambda : file.read(100 * 1024), b''):
                    new_file.write(data)
and
# zivoni's version
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
    file_number = 1
    for filename in files:
        if filename.endswith('.json.bz2'):
            filepath = os.path.join(dirpath, filename)
            file_number += 1 # tried other placements, too
            for file_number, file_name in enumerate(files, 1):
                new_filename = "{:03d}.json".format(file_number)  # you can use just str() + ".json", but format adds more options
                newfilepath = os.path.join(dirpath, new_filename)
                with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
                    for data in iter(lambda : file.read(100 * 1024), b''):
                        new_file.write(data)
Reply
#17
'counter' should be outside of the os.walk() loop

file_number = 1
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
    for filename in files:
        if filename.endswith('.json.bz2'):
            filepath = os.path.join(dirpath, filename)
            for file_name in files:
                new_filename = "{:03d}.json".format(file_number)  # you can use just str() + ".json", but format adds more options
                newfilepath = os.path.join(dirpath, new_filename)
                with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
                    for data in iter(lambda : file.read(100 * 1024), b''):
                        new_file.write(data)
                        file_number += 1 # tried other placements, too
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#18
Right, the +1 was fine in both places, what you wanted to change was it getting set to 0 every directory. So, instead of setting it to 0 every directory, just do it once, before any of the for loops.
Reply
#19
Yes, as wavic posted. There is a misindent in his post on last line (file_number += 1), it should be on same level as newfilepath = ...
(now it increases per 100kB read).
Reply
#20
I'd like to thank everybody for being so supportive and helpful. The task has been accomplished with the following code:

file_counter = 0
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
   for filename in files:
       file_counter += 1
       if filename.endswith('.json.bz2'):
           filepath = os.path.join(dirpath, filename)
           newfilepath = os.path.join(dirpath, "{0}.json".format(file_counter))
           print(newfilepath)
           with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
               for data in iter(lambda : file.read(100 * 1024), b''):
                   new_file.write(data)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Organization of project directories wotoko 3 435 Mar-02-2024, 03:34 PM
Last Post: Larz60+
  Listing directories (as a text file) kiwi99 1 844 Feb-17-2023, 12:58 PM
Last Post: Larz60+
  Find duplicate files in multiple directories Pavel_47 9 3,141 Dec-27-2022, 04:47 PM
Last Post: deanhystad
  rename same file names in different directories elnk 0 719 Nov-04-2022, 05:23 PM
Last Post: elnk
  I need to copy all the directories that do not match the pattern tester_V 7 2,447 Feb-04-2022, 06:26 PM
Last Post: tester_V
  Functions to consider for file renaming and moving around directories cubangt 2 1,766 Jan-07-2022, 02:16 PM
Last Post: cubangt
  Moving specific files then unzipping/decompressing christophereccles 2 2,375 Apr-24-2021, 04:25 AM
Last Post: ndc85430
  Python create directories within directories mcesmcsc 2 2,223 Dec-17-2019, 12:32 PM
Last Post: mcesmcsc
  Shutil attempts to copy directories that don't exist ConsoleGeek 5 4,565 Oct-29-2019, 09:26 PM
Last Post: Gribouillis
  How to combine file names into a list from multiple directories? python_newbie09 3 5,223 Jul-09-2019, 07:38 PM
Last Post: python_newbie09

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020