Python Forum

Full Version: How to concatenate files while looping through lists?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a few hundreds of files in one folder and every file in the folder has 2 more corresponding files that it needs to combine with. I have done the code for finding the corresponding files and adding them into a list but I am now stuck at figuring out how to combine these files while looping through each list.

f = f = ['a_b_c_111.hdf', 'b_b_c_111.hdf', 'b_c_e_112.hdf','c_c_e_112.hdf']
file_to_combine = {}

for file in f:
    a,b,c,d = re.split(r'[_]',file)
    s = c + '_' + d
    if s in file_to_combine:        
        file_to_combine[s].append(os.path.join(file))
    else:
        file_to_combine[s] =[os.path.join(file)]

for (k, v) in file_to_combine.items():    
    files = [','.join(v)]
    
    for i in files:
        split_files = files[0].split(",")
        print (split_files) 
running this script will result in 2 lists but it will be more when I run through 100s of files:

['a_b_c_111.hdf', 'b_b_c_111.hdf']
['b_c_e_112.hdf', 'c_c_e_112.hdf']

I am now stuck and finding a way to loop through this list and concatenating hdf files within each of this list.
Would appreciate some help on how best to do this. Thanks.
This might be useful HDF5 for Python
Maybe this helps a little bit to understand.

import pathlib
from collections import defaultdict
from itertools import groupby

def get_group(file):
    """
    Returns the group as a str
    """
    return '_'.join(file.name.split('_')[2:4])

def file_order(file):
    a, b = map(int, file.name.split('_')[:2])
    # I am using numbers for a and b in test code
    # 
    #a, b = file.name.split('_')[:2]
    return a, b
    
def grouped_files(doc_root):
    doc_root = pathlib.Path(doc_root)
    grouped = defaultdict(list)
    iterator = doc_root.glob('*_*_*_*.hdf')
    for group, items in groupby(iterator, key=get_group):
        for item in items:
            grouped[group].append(item)
    for files in grouped.values():
        # doing an inline sort, which mutates the
        # list
        files.sort(key=file_order)       
    return grouped
If the order of the hdf_files in each group doesn't matter, you can remove the sort code.
On the other side, sorting can simplify the code.

def grouped_files(doc_root):
    doc_root = pathlib.Path(doc_root)
    grouped = {}
    iterator = doc_root.glob('*_*_*_*.hdf')
    sorted_files = sorted(iterator, key=get_group)
    for group, items in groupby(sorted_files, key=get_group):
        # because of the sorted list by group,
        # each group occours only once
        # items is an interator, you have to consume the iterator
        grouped[group] = list(items)
        # if sorting of hdf_files is required
        # grouped[group] = sorted(items, key=file_order)
    return grouped
https://docs.python.org/3/library/stdtypes.html#list.sort
https://docs.python.org/3/library/functions.html#sorted
https://docs.python.org/3/library/iterto...ls.groupby
https://docs.python.org/3/library/pathli...le-pathlib
thank you. i will try this out.