Python Forum
Group files according to first few characters in filename
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Group files according to first few characters in filename
#1
I used the script below that worked for one set of files that had the same name in different folders but for this second set of files, it only shares the same first five characters across the folders so I am finding it difficult to try to combine these files. I tried using .startswith and it is also not giving me any results.

file_list = {}

for dirname in os.listdir(path):
    #print("dir:",dirname)
    for root,dirs,files in os.walk(path+dirname):
        for filename in files:
            if filename not in file_list:
                file_list[filename]=[]
            file_list[filename].append(os.path.join(root,filename))
print()
pprint.pprint(file_list)
how do I improve this code to only look for the first five characters in each file before adding into the dictionary? Thank you
Reply
#2
(Aug-01-2019, 10:36 AM)python_newbie09 Wrote: I used the script below that worked for one set of files that had the same name in different folders but for this second set of files, it only shares the same first five characters across the folders so I am finding it difficult to try to combine these files. I tried using .startswith and it is also not giving me any results.

file_list = {}

for dirname in os.listdir(path):
    #print("dir:",dirname)
    for root,dirs,files in os.walk(path+dirname):
        for filename in files:
            if filename not in file_list:
                file_list[filename]=[]
            file_list[filename].append(os.path.join(root,filename))
print()
pprint.pprint(file_list)
how do I improve this code to only look for the first five characters in each file before adding into the dictionary? Thank you

Hello i cant understand what exactly do you want, my english it not good, can you provide an example ?
Reply
#3
i don't know why this shouldn't work, what exactly didn't work by using 'startswith':

if filename.startswith('...')         # your five letters
Yeah, an example would help understanding what you really need
Reply
#4
(Aug-01-2019, 11:32 AM)Friend Wrote: i don't know why this shouldn't work, what exactly didn't work by using 'startswith':

if filename.startswith('...')         # your five letters
Yeah, an example would help understanding what you really need

My directory and file structure is as below
Main_Folder
--SubFolder1
---aaa_001
---bbb_002

--SunFolder2
---aaa_002
---bbb__004

So the idea is I want to group files based on the first 3 characters in this case and have a dictionary structure that will look as below when printing the file_list

{aaa: [aaa_001,aaa_002],
bbb: [bbb_002, bbb_004]}

when using startswith it just gives a true or false value but the result i get is as below:

{aaa_001: [aaa_001],
aaa_002: [aaa_002],
bbb_002: [bbb_002],
bbb_004:[bbb_004]}
Reply
#5
(Aug-01-2019, 06:39 PM)python_newbie09 Wrote:
(Aug-01-2019, 11:32 AM)Friend Wrote: i don't know why this shouldn't work, what exactly didn't work by using 'startswith':

if filename.startswith('...')         # your five letters
Yeah, an example would help understanding what you really need

My directory and file structure is as below
Main_Folder
--SubFolder1
---aaa_001
---bbb_002

--SunFolder2
---aaa_002
---bbb__004

So the idea is I want to group files based on the first 3 characters in this case and have a dictionary structure that will look as below when printing the file_list

{aaa: [aaa_001,aaa_002],
bbb: [bbb_002, bbb_004]}

when using startswith it just gives a true or false value but the result i get is as below:

{aaa_001: [aaa_001],
aaa_002: [aaa_002],
bbb_002: [bbb_002],
bbb_004:[bbb_004]}

Try this

for root, dirs, files in os.walk(path):
    for name in files:
    	identifier = name[:3]
    	if identifier not in file_list:
    		file_list[identifier] = []
    		file_list[identifier].append(os.path.join(root,name))
    	else:
    		file_list[identifier].extend([root + "/" + str(name)])
Reply
#6
(Aug-01-2019, 08:23 PM)cvsae Wrote:
(Aug-01-2019, 06:39 PM)python_newbie09 Wrote: My directory and file structure is as below
Main_Folder
--SubFolder1
---aaa_001
---bbb_002

--SunFolder2
---aaa_002
---bbb__004

So the idea is I want to group files based on the first 3 characters in this case and have a dictionary structure that will look as below when printing the file_list

{aaa: [aaa_001,aaa_002],
bbb: [bbb_002, bbb_004]}

when using startswith it just gives a true or false value but the result i get is as below:

{aaa_001: [aaa_001],
aaa_002: [aaa_002],
bbb_002: [bbb_002],
bbb_004:[bbb_004]}

Try this

for root, dirs, files in os.walk(path):
    for name in files:
    	identifier = name[:3]
    	if identifier not in file_list:
    		file_list[identifier] = []
    		file_list[identifier].append(os.path.join(root,name))
    	else:
    		file_list[identifier].extend([root + "/" + str(name)])

Brilliant!! Thank you. But if I may ask, how is this script different from the other? I understand you added the identifier to look up for the first few characters but i dont really understand what is going on in the if else condition. Would appreciate your explanation. Thanks!
Reply
#7
Observation: I would avoid misleading names. Not good to have name file_list for dictionary.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#8
(Aug-02-2019, 05:56 AM)perfringo Wrote: Observation: I would avoid misleading names. Not good to have name file_list for dictionary.

You are right
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Rename part of filename in multiple files atomxkai 7 7,212 Feb-18-2022, 10:03 PM
Last Post: atomxkai
  Append files and add column with last part of each filename NiKirk 0 2,564 Feb-04-2022, 07:35 AM
Last Post: NiKirk
  Compare filename with folder name and copy matching files into a particular folder shantanu97 2 4,390 Dec-18-2021, 09:32 PM
Last Post: Larz60+
  Rename Multiple files in directory to remove special characters nyawadasi 9 6,232 Feb-16-2021, 09:49 PM
Last Post: BashBedlam
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,420 May-15-2020, 01:37 PM
Last Post: snippsat
  extract specific data from a group of json-files ledgreve 3 3,237 Dec-05-2019, 07:57 PM
Last Post: ndc85430
  Rename only first 4 characters of filename bmatt8 2 3,787 Nov-15-2018, 05:15 PM
Last Post: nilamo
  copy files from one destination to another by reading filename from csv Prince_Bhatia 3 7,562 Feb-27-2018, 10:56 AM
Last Post: Prince_Bhatia
  How to create def for sorted() from list of versioning files (filename+datetime) DrLove73 10 7,580 Jan-16-2017, 11:43 AM
Last Post: DrLove73

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020