Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How to concatenate files while looping through lists?
I have a few hundreds of files in one folder and every file in the folder has 2 more corresponding files that it needs to combine with. I have done the code for finding the corresponding files and adding them into a list but I am now stuck at figuring out how to combine these files while looping through each list.

f = f = ['a_b_c_111.hdf', 'b_b_c_111.hdf', 'b_c_e_112.hdf','c_c_e_112.hdf']
file_to_combine = {}

for file in f:
    a,b,c,d = re.split(r'[_]',file)
    s = c + '_' + d
    if s in file_to_combine:        
        file_to_combine[s] =[os.path.join(file)]

for (k, v) in file_to_combine.items():    
    files = [','.join(v)]
    for i in files:
        split_files = files[0].split(",")
        print (split_files) 
running this script will result in 2 lists but it will be more when I run through 100s of files:

['a_b_c_111.hdf', 'b_b_c_111.hdf']
['b_c_e_112.hdf', 'c_c_e_112.hdf']

I am now stuck and finding a way to loop through this list and concatenating hdf files within each of this list.
Would appreciate some help on how best to do this. Thanks.
This might be useful HDF5 for Python
Maybe this helps a little bit to understand.

import pathlib
from collections import defaultdict
from itertools import groupby

def get_group(file):
    Returns the group as a str
    return '_'.join('_')[2:4])

def file_order(file):
    a, b = map(int,'_')[:2])
    # I am using numbers for a and b in test code
    #a, b ='_')[:2]
    return a, b
def grouped_files(doc_root):
    doc_root = pathlib.Path(doc_root)
    grouped = defaultdict(list)
    iterator = doc_root.glob('*_*_*_*.hdf')
    for group, items in groupby(iterator, key=get_group):
        for item in items:
    for files in grouped.values():
        # doing an inline sort, which mutates the
        # list
    return grouped
If the order of the hdf_files in each group doesn't matter, you can remove the sort code.
On the other side, sorting can simplify the code.

def grouped_files(doc_root):
    doc_root = pathlib.Path(doc_root)
    grouped = {}
    iterator = doc_root.glob('*_*_*_*.hdf')
    sorted_files = sorted(iterator, key=get_group)
    for group, items in groupby(sorted_files, key=get_group):
        # because of the sorted list by group,
        # each group occours only once
        # items is an interator, you have to consume the iterator
        grouped[group] = list(items)
        # if sorting of hdf_files is required
        # grouped[group] = sorted(items, key=file_order)
    return grouped
My code examples are always for Python >=3.6.0
Almost dead, but too lazy to die:
All humans together. We don't need politicians!
thank you. i will try this out.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  can only concatenate str (not "int") to str gr3yali3n 6 183 May-28-2020, 07:20 AM
Last Post: pyzyx3qwerty
  How to concatenate multiple dataframes rajeshE 1 211 Mar-02-2020, 06:37 AM
Last Post: scidam
  Concatenate two dictionaries harish 3 326 Oct-12-2019, 04:52 PM
Last Post: strngr12
  Looping through music files (SOLVED) ebolisa 0 446 Jul-13-2019, 06:16 PM
Last Post: ebolisa
  Looping through csv files in a folder WhatsupSmiley 3 3,524 Nov-13-2018, 08:39 PM
Last Post: Larz60+
  Files handling and lists gonzo620 12 1,547 Oct-09-2018, 01:35 AM
Last Post: ichabod801
  Python split and concatenate saravanatn 5 1,230 Jul-31-2018, 08:29 AM
Last Post: Axel_Erfurt
  Looping through files, check content and delete metalray 1 778 May-11-2018, 02:16 PM
Last Post: buran
  looping through lists brianl 2 908 Jan-10-2018, 07:06 PM
Last Post: brianl
  Looping .xlsx files in folder/subfolders copy pasting currentregion HarrisQ 4 2,689 Apr-17-2017, 06:35 AM
Last Post: HarrisQ

Forum Jump:

Users browsing this thread: 1 Guest(s)