Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How to concatenate files while looping through lists?
I have a few hundreds of files in one folder and every file in the folder has 2 more corresponding files that it needs to combine with. I have done the code for finding the corresponding files and adding them into a list but I am now stuck at figuring out how to combine these files while looping through each list.

f = f = ['a_b_c_111.hdf', 'b_b_c_111.hdf', 'b_c_e_112.hdf','c_c_e_112.hdf']
file_to_combine = {}

for file in f:
    a,b,c,d = re.split(r'[_]',file)
    s = c + '_' + d
    if s in file_to_combine:        
        file_to_combine[s] =[os.path.join(file)]

for (k, v) in file_to_combine.items():    
    files = [','.join(v)]
    for i in files:
        split_files = files[0].split(",")
        print (split_files) 
running this script will result in 2 lists but it will be more when I run through 100s of files:

['a_b_c_111.hdf', 'b_b_c_111.hdf']
['b_c_e_112.hdf', 'c_c_e_112.hdf']

I am now stuck and finding a way to loop through this list and concatenating hdf files within each of this list.
Would appreciate some help on how best to do this. Thanks.
This might be useful HDF5 for Python
Maybe this helps a little bit to understand.

import pathlib
from collections import defaultdict
from itertools import groupby

def get_group(file):
    Returns the group as a str
    return '_'.join('_')[2:4])

def file_order(file):
    a, b = map(int,'_')[:2])
    # I am using numbers for a and b in test code
    #a, b ='_')[:2]
    return a, b
def grouped_files(doc_root):
    doc_root = pathlib.Path(doc_root)
    grouped = defaultdict(list)
    iterator = doc_root.glob('*_*_*_*.hdf')
    for group, items in groupby(iterator, key=get_group):
        for item in items:
    for files in grouped.values():
        # doing an inline sort, which mutates the
        # list
    return grouped
If the order of the hdf_files in each group doesn't matter, you can remove the sort code.
On the other side, sorting can simplify the code.

def grouped_files(doc_root):
    doc_root = pathlib.Path(doc_root)
    grouped = {}
    iterator = doc_root.glob('*_*_*_*.hdf')
    sorted_files = sorted(iterator, key=get_group)
    for group, items in groupby(sorted_files, key=get_group):
        # because of the sorted list by group,
        # each group occours only once
        # items is an interator, you have to consume the iterator
        grouped[group] = list(items)
        # if sorting of hdf_files is required
        # grouped[group] = sorted(items, key=file_order)
    return grouped
My code examples are always for Python >=3.6.0
Almost dead, but too lazy to die:
All humans together. We don't need politicians!
thank you. i will try this out.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Concatenate two dictionaries harish 3 136 Oct-12-2019, 04:52 PM
Last Post: strngr12
  Looping through music files (SOLVED) ebolisa 0 221 Jul-13-2019, 06:16 PM
Last Post: ebolisa
  How to concatenate nested numpy arrays? python_newbie09 2 428 Apr-16-2019, 07:00 PM
Last Post: python_newbie09
  Looping through csv files in a folder WhatsupSmiley 3 1,756 Nov-13-2018, 08:39 PM
Last Post: Larz60+
  Files handling and lists gonzo620 12 1,155 Oct-09-2018, 01:35 AM
Last Post: ichabod801
  Python split and concatenate saravanatn 5 984 Jul-31-2018, 08:29 AM
Last Post: Axel_Erfurt
  Looping through files, check content and delete metalray 1 622 May-11-2018, 02:16 PM
Last Post: buran
  looping through lists brianl 2 741 Jan-10-2018, 07:06 PM
Last Post: brianl
  Looping .xlsx files in folder/subfolders copy pasting currentregion HarrisQ 4 2,243 Apr-17-2017, 06:35 AM
Last Post: HarrisQ

Forum Jump:

Users browsing this thread: 1 Guest(s)