Feb-11-2020, 11:31 PM
(This post was last modified: Feb-11-2020, 11:31 PM by Gribouillis.)
What you could do is manipulate the rows at input by defining a 'merging plan' depending on the input headers. For example suppose that the input file has the headers
The following code shows how one can automatically compute the merging plan from the input headers and how one can transform the input rows according to this plan. After that you can output the new rows as if they were the actual input rows, which you already know how to do.
I'm using the function
['header2', 'header5', 'header3', 'spam', 'header10', 'header13']
. Then the merging plan would be: combine header2 and header3 to make a column newheaderA, change the column header5 into a column newheaderC, leave the spam column unchanged and combine header10 and header13 into a newheaderD. This plan can be represented by the python list [('newheaderA', ['header2', 'header3']), ('newheaderC', ['header5']), ('spam', ['spam']), ('newheaderD', ['header10', 'header13'])]
.The following code shows how one can automatically compute the merging plan from the input headers and how one can transform the input rows according to this plan. After that you can output the new rows as if they were the actual input rows, which you already know how to do.
I'm using the function
more_itertools.unique_everseen()
. If you don't want to import more_itertools, you can simply copy the implementation of unique_everseen that is given at the end of the official documentation page of module itertools.from more_itertools import unique_everseen rules = [ ('newheaderA', ['header1', 'header2', 'header3']), ('newheaderB', ['header4']), ('newheaderC', ['header5']), ('newheaderD', ['header6', 'header7', 'header8', 'header9', 'header10', 'header11', 'header12', 'header13', 'header14']), ] inverse_rules = { old: new for new, olds in rules for old in olds} drules = dict(rules) def merging_plan(headers): headers = list(headers) news = list(unique_everseen(inverse_rules.get(h, h) for h in headers)) s = set(headers) plan = [] for new in news: plan.append((new, [old for old in drules.get(new, [new]) if old in s])) return plan def merge(plan, row): return {k: ' '.join(row[x] for x in v) for k, v in plan} def main(): # compute the merging plan for a given sequence of input headers headers = ['header2', 'header5', 'header3', 'spam', 'header10', 'header13'] plan = merging_plan(headers) print(plan) # compute the merged row corresponding to an input row r = {'header2': 'v2', 'header5': 'v5', 'header3': 'v3', 'spam': 'vspam', 'header10': 'v10', 'header13': 'v13',} print(merge(plan, r)) if __name__ == '__main__': main()
Output:[('newheaderA', ['header2', 'header3']), ('newheaderC', ['header5']), ('spam', ['spam']), ('newheaderD', ['header10', 'header13'])]
{'spam': 'vspam', 'newheaderD': 'v10 v13', 'newheaderC': 'v5', 'newheaderA': 'v2 v3'}