Python Forum
How to concatenate files while looping through lists? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to concatenate files while looping through lists? (/thread-16934.html)



How to concatenate files while looping through lists? - python_newbie09 - Mar-20-2019

I have a few hundreds of files in one folder and every file in the folder has 2 more corresponding files that it needs to combine with. I have done the code for finding the corresponding files and adding them into a list but I am now stuck at figuring out how to combine these files while looping through each list.

f = f = ['a_b_c_111.hdf', 'b_b_c_111.hdf', 'b_c_e_112.hdf','c_c_e_112.hdf']
file_to_combine = {}

for file in f:
    a,b,c,d = re.split(r'[_]',file)
    s = c + '_' + d
    if s in file_to_combine:        
        file_to_combine[s].append(os.path.join(file))
    else:
        file_to_combine[s] =[os.path.join(file)]

for (k, v) in file_to_combine.items():    
    files = [','.join(v)]
    
    for i in files:
        split_files = files[0].split(",")
        print (split_files) 
running this script will result in 2 lists but it will be more when I run through 100s of files:

['a_b_c_111.hdf', 'b_b_c_111.hdf']
['b_c_e_112.hdf', 'c_c_e_112.hdf']

I am now stuck and finding a way to loop through this list and concatenating hdf files within each of this list.
Would appreciate some help on how best to do this. Thanks.


RE: How to concatenate files while looping through lists? - Yoriz - Mar-20-2019

This might be useful HDF5 for Python


RE: How to concatenate files while looping through lists? - DeaD_EyE - Mar-20-2019

Maybe this helps a little bit to understand.

import pathlib
from collections import defaultdict
from itertools import groupby

def get_group(file):
    """
    Returns the group as a str
    """
    return '_'.join(file.name.split('_')[2:4])

def file_order(file):
    a, b = map(int, file.name.split('_')[:2])
    # I am using numbers for a and b in test code
    # 
    #a, b = file.name.split('_')[:2]
    return a, b
    
def grouped_files(doc_root):
    doc_root = pathlib.Path(doc_root)
    grouped = defaultdict(list)
    iterator = doc_root.glob('*_*_*_*.hdf')
    for group, items in groupby(iterator, key=get_group):
        for item in items:
            grouped[group].append(item)
    for files in grouped.values():
        # doing an inline sort, which mutates the
        # list
        files.sort(key=file_order)       
    return grouped
If the order of the hdf_files in each group doesn't matter, you can remove the sort code.
On the other side, sorting can simplify the code.

def grouped_files(doc_root):
    doc_root = pathlib.Path(doc_root)
    grouped = {}
    iterator = doc_root.glob('*_*_*_*.hdf')
    sorted_files = sorted(iterator, key=get_group)
    for group, items in groupby(sorted_files, key=get_group):
        # because of the sorted list by group,
        # each group occours only once
        # items is an interator, you have to consume the iterator
        grouped[group] = list(items)
        # if sorting of hdf_files is required
        # grouped[group] = sorted(items, key=file_order)
    return grouped
https://docs.python.org/3/library/stdtypes.html#list.sort
https://docs.python.org/3/library/functions.html#sorted
https://docs.python.org/3/library/itertools.html#itertools.groupby
https://docs.python.org/3/library/pathlib.html#module-pathlib


RE: How to concatenate files while looping through lists? - python_newbie09 - Mar-24-2019

thank you. i will try this out.