Posts: 331
Threads: 2
Joined: Feb 2017
Mar-29-2017, 04:55 PM
(This post was last modified: Mar-29-2017, 05:00 PM by zivoni.)
You can use auxiliary variable, initialize it to 1 before your first "files" for statement and use it to create filename and increase after use.
for dirpath, dirname, ...
file_number = 1
for filename in ..
....
new_filename = "{:03d}.json".format(file_number) # you can use just str() + ".json", but format adds more options
file_number += 1
newfilepath = os.path.join(dirpath, new_filename)
... If you have some old files with same name, it will overwrite them, so be careful. And with "numerical" file names its sometimes useful to zero-fill number, so they are sorted naturally (no 1.json, 11.json, ..19.json, 2.json, but 001.json, 002.json...).
Posts: 70
Threads: 17
Joined: Feb 2017
Mar-29-2017, 05:41 PM
(This post was last modified: Mar-29-2017, 05:43 PM by kiton.)
nilamo, zivoni, thank you very much for suggestions -- I tried them both and they work fine creating relevant numbering. However, note that with each new directory the counter starts from zero again -- what is the appropriate way to make it go on and on till the last file (e.g., 99999.json)?
print(newfilepath) # example utilizing zivoni's code (nilamo's works in a similar fashion)
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/13/058.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/13/059.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/13/060.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/14/001.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/14/002.json
/Users/mymac/Documents/Jupyter/Twitter/1 day test/00/14/003.json
Posts: 3,458
Threads: 101
Joined: Sep 2016
Instead of setting the file_counter (or whatever) inside the for loop, set it once before either loop starts.
Posts: 2,955
Threads: 48
Joined: Sep 2016
for file_number, file_name in enumerate(files, 1):
new_filename = "{:03d}.json".format(file_number)
if you want file names to start from 1. If not: enumerate(files)
Posts: 3,458
Threads: 101
Joined: Sep 2016
But then it's re-starting for every directory. OP said they want to maintain counts across directories.
Posts: 70
Threads: 17
Joined: Feb 2017
Mar-29-2017, 07:05 PM
(This post was last modified: Mar-29-2017, 07:05 PM by kiton.)
Yep, I keep trying different variations of the code, but it always keeps restarting in each new directory. I'd like there to be no "duplicate" names, but subsequent numbers, if possible.
Code variations tried:
# nilamo's version
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
file_counter = 0
for filename in files:
file_counter += 1 # tried other placements, too
if filename.endswith('.json.bz2'):
filepath = os.path.join(dirpath, filename)
newfilepath = os.path.join(dirpath, "{0}.json".format(file_counter))
print(newfilepath)
with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
for data in iter(lambda : file.read(100 * 1024), b''):
new_file.write(data) and
# zivoni's version
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
file_number = 1
for filename in files:
if filename.endswith('.json.bz2'):
filepath = os.path.join(dirpath, filename)
file_number += 1 # tried other placements, too
for file_number, file_name in enumerate(files, 1):
new_filename = "{:03d}.json".format(file_number) # you can use just str() + ".json", but format adds more options
newfilepath = os.path.join(dirpath, new_filename)
with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
for data in iter(lambda : file.read(100 * 1024), b''):
new_file.write(data)
Posts: 2,955
Threads: 48
Joined: Sep 2016
Mar-29-2017, 07:27 PM
(This post was last modified: Mar-29-2017, 07:29 PM by wavic.)
'counter' should be outside of the os.walk() loop
file_number = 1
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
for filename in files:
if filename.endswith('.json.bz2'):
filepath = os.path.join(dirpath, filename)
for file_name in files:
new_filename = "{:03d}.json".format(file_number) # you can use just str() + ".json", but format adds more options
newfilepath = os.path.join(dirpath, new_filename)
with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
for data in iter(lambda : file.read(100 * 1024), b''):
new_file.write(data)
file_number += 1 # tried other placements, too
Posts: 3,458
Threads: 101
Joined: Sep 2016
Right, the +1 was fine in both places, what you wanted to change was it getting set to 0 every directory. So, instead of setting it to 0 every directory, just do it once, before any of the for loops.
Posts: 331
Threads: 2
Joined: Feb 2017
Yes, as wavic posted. There is a misindent in his post on last line (file_number += 1), it should be on same level as newfilepath = ...
(now it increases per 100kB read).
Posts: 70
Threads: 17
Joined: Feb 2017
I'd like to thank everybody for being so supportive and helpful. The task has been accomplished with the following code:
file_counter = 0
for dirpath, dirname, files in os.walk('/Users/mymac/Documents/Jupyter/Twitter/1 day test'):
for filename in files:
file_counter += 1
if filename.endswith('.json.bz2'):
filepath = os.path.join(dirpath, filename)
newfilepath = os.path.join(dirpath, "{0}.json".format(file_counter))
print(newfilepath)
with open(newfilepath, 'wb') as new_file, bz2.BZ2File(filepath, 'rb') as file:
for data in iter(lambda : file.read(100 * 1024), b''):
new_file.write(data)
|