Python Forum

Full Version: Group files according to first few characters in filename
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I used the script below that worked for one set of files that had the same name in different folders but for this second set of files, it only shares the same first five characters across the folders so I am finding it difficult to try to combine these files. I tried using .startswith and it is also not giving me any results.

file_list = {}

for dirname in os.listdir(path):
    #print("dir:",dirname)
    for root,dirs,files in os.walk(path+dirname):
        for filename in files:
            if filename not in file_list:
                file_list[filename]=[]
            file_list[filename].append(os.path.join(root,filename))
print()
pprint.pprint(file_list)
how do I improve this code to only look for the first five characters in each file before adding into the dictionary? Thank you
(Aug-01-2019, 10:36 AM)python_newbie09 Wrote: [ -> ]I used the script below that worked for one set of files that had the same name in different folders but for this second set of files, it only shares the same first five characters across the folders so I am finding it difficult to try to combine these files. I tried using .startswith and it is also not giving me any results.

file_list = {}

for dirname in os.listdir(path):
    #print("dir:",dirname)
    for root,dirs,files in os.walk(path+dirname):
        for filename in files:
            if filename not in file_list:
                file_list[filename]=[]
            file_list[filename].append(os.path.join(root,filename))
print()
pprint.pprint(file_list)
how do I improve this code to only look for the first five characters in each file before adding into the dictionary? Thank you

Hello i cant understand what exactly do you want, my english it not good, can you provide an example ?
i don't know why this shouldn't work, what exactly didn't work by using 'startswith':

if filename.startswith('...')         # your five letters
Yeah, an example would help understanding what you really need
(Aug-01-2019, 11:32 AM)Friend Wrote: [ -> ]i don't know why this shouldn't work, what exactly didn't work by using 'startswith':

if filename.startswith('...')         # your five letters
Yeah, an example would help understanding what you really need

My directory and file structure is as below
Main_Folder
--SubFolder1
---aaa_001
---bbb_002

--SunFolder2
---aaa_002
---bbb__004

So the idea is I want to group files based on the first 3 characters in this case and have a dictionary structure that will look as below when printing the file_list

{aaa: [aaa_001,aaa_002],
bbb: [bbb_002, bbb_004]}

when using startswith it just gives a true or false value but the result i get is as below:

{aaa_001: [aaa_001],
aaa_002: [aaa_002],
bbb_002: [bbb_002],
bbb_004:[bbb_004]}
(Aug-01-2019, 06:39 PM)python_newbie09 Wrote: [ -> ]
(Aug-01-2019, 11:32 AM)Friend Wrote: [ -> ]i don't know why this shouldn't work, what exactly didn't work by using 'startswith':

if filename.startswith('...')         # your five letters
Yeah, an example would help understanding what you really need

My directory and file structure is as below
Main_Folder
--SubFolder1
---aaa_001
---bbb_002

--SunFolder2
---aaa_002
---bbb__004

So the idea is I want to group files based on the first 3 characters in this case and have a dictionary structure that will look as below when printing the file_list

{aaa: [aaa_001,aaa_002],
bbb: [bbb_002, bbb_004]}

when using startswith it just gives a true or false value but the result i get is as below:

{aaa_001: [aaa_001],
aaa_002: [aaa_002],
bbb_002: [bbb_002],
bbb_004:[bbb_004]}

Try this

for root, dirs, files in os.walk(path):
    for name in files:
    	identifier = name[:3]
    	if identifier not in file_list:
    		file_list[identifier] = []
    		file_list[identifier].append(os.path.join(root,name))
    	else:
    		file_list[identifier].extend([root + "/" + str(name)])
(Aug-01-2019, 08:23 PM)cvsae Wrote: [ -> ]
(Aug-01-2019, 06:39 PM)python_newbie09 Wrote: [ -> ]My directory and file structure is as below
Main_Folder
--SubFolder1
---aaa_001
---bbb_002

--SunFolder2
---aaa_002
---bbb__004

So the idea is I want to group files based on the first 3 characters in this case and have a dictionary structure that will look as below when printing the file_list

{aaa: [aaa_001,aaa_002],
bbb: [bbb_002, bbb_004]}

when using startswith it just gives a true or false value but the result i get is as below:

{aaa_001: [aaa_001],
aaa_002: [aaa_002],
bbb_002: [bbb_002],
bbb_004:[bbb_004]}

Try this

for root, dirs, files in os.walk(path):
    for name in files:
    	identifier = name[:3]
    	if identifier not in file_list:
    		file_list[identifier] = []
    		file_list[identifier].append(os.path.join(root,name))
    	else:
    		file_list[identifier].extend([root + "/" + str(name)])

Brilliant!! Thank you. But if I may ask, how is this script different from the other? I understand you added the identifier to look up for the first few characters but i dont really understand what is going on in the if else condition. Would appreciate your explanation. Thanks!
Observation: I would avoid misleading names. Not good to have name file_list for dictionary.
(Aug-02-2019, 05:56 AM)perfringo Wrote: [ -> ]Observation: I would avoid misleading names. Not good to have name file_list for dictionary.

You are right