Python Forum
conditional groupby and aggregation on the conditioned group values
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
conditional groupby and aggregation on the conditioned group values
#1
ID City State Code Revenue
12 Indore MP 420 1567290
12 Indore MP 420 2009761
17 Mumbai MH 630 1567290
15 Mumbai MH 514 2009761
13 Indore MP 420 5289016
12 Mumbai MH 630 8904379


I am a newbie to Python
My data is as above. My problem statement is to group this data by keys-ID,City,State,Code and for like values of the keys the Revenue should be REvenue[i]+Revenue[j] E.g.: for ID=12, City=Indore, State=MP, Code=420 Revenue varies in the first two rows. So I wish to make the revenue as addition of both revenues for both the rows. Similarly for other rows where other keys match the revenue should be addition of therespective keys. I am unable to do this with the regular groupby+sum() on pandas . Not gettingthe expected output. Can someone please guide me with a generic code. Thank you

If the revenues are different for the like values of keys then the revenue should be as it is
Reply
#2
If your input is in csv-format, you could use the csv module.
The main point by grouping is, that you have to sort it before.
The functions sorted and itertools.groupby have key functions, which are called.
A helper function is operator.itemgetter to get items by index or name.

from operator import itemgetter


data = [(1,10), (2, 0), (3, -10), (4, -20)]
print(sorted(data, key=itemgetter(1))) # sort by second item
Output:
[(4, -20), (3, -10), (2, 0), (1, 10)]
Now let's do it with data for grouping:

from operator import itemgetter
from itertools import groupby


data = [
    (10,20,'foo'),
    (10,30,'bar'),
    (10,10,'fizz'),
    (1,3,'bar'),
    (1,3,'foo'),
]
getter = itemgetter(2) # getter for third field
sorted_list = sorted(data, key=getter)

print(sorted_list)
Output:
[(10, 30, 'bar'), (1, 3, 'bar'), (10, 10, 'fizz'), (10, 20, 'foo'), (1, 3, 'foo')]
Next step is the grouping:
from operator import itemgetter
from itertools import groupby


data = [
    (10,20,'foo'),
    (10,30,'bar'),
    (10,10,'fizz'),
    (1,3,'bar'),
    (1,3,'foo'),
]
getter = itemgetter(2) # getter for third field
sorted_list = sorted(data, key=getter)
grouped = groupby(sorted_list, getter)

for group, elements in grouped:
    print('Group ->', group)
    for element in elements:
        print(element)
Output:
Group -> bar (10, 30, 'bar') (1, 3, 'bar') Group -> fizz (10, 10, 'fizz') Group -> foo (10, 20, 'foo') (1, 3, 'foo')
If you don't sort, following happens:
from operator import itemgetter
from itertools import groupby


data = [
    (10,20,'foo'),
    (10,30,'bar'),
    (10,10,'fizz'),
    (1,3,'bar'),
    (1,3,'foo'),
]
getter = itemgetter(2) # getter for third field
grouped = groupby(data, getter)

for group, elements in grouped:
    print('Group ->', group)
    for element in elements:
        print(element)
Output:
Group -> foo (10, 20, 'foo') Group -> bar (10, 30, 'bar') Group -> fizz (10, 10, 'fizz') Group -> bar (1, 3, 'bar') Group -> foo (1, 3, 'foo')
Now the whole implementation with some tricks:
#!/usr/bin/env python3

"""
Group a csv file by a given field.
"""
import sys
from argparse import ArgumentParser
from itertools import groupby
from operator import itemgetter
from pathlib import Path
from csv import reader, writer


def group_csv_reader(file, fields):
    with file.open() as fd:
        csv = reader(fd)
        header = next(csv)
        for field in fields:
            if field not in header:
                raise ValueError(f'Only following fields are valid to group: {header}')
        indicies = [header.index(field) for field in fields]
        getter = itemgetter(*indicies)
        # the itemgetter can get one or more items from a sequence or mapping
        yield header
        csv = sorted(csv, key=getter)
        # remove sorted, to see the effect
        # this will consome the whole csv file
        # and sort it. After the sorting
        # the whole content of the file is in memory
        # if the file is smaller than 1GiB,
        # it's ok not to be memory efficient
    # now the file is closed and
    # csv still contains the sorted rows by field
    csv = groupby(csv, key=getter)
    # groupby itself does not sort
    # it just yield the group and rows
    for group, rows in csv:
        for row in rows:
            yield row


def main():
    parser = ArgumentParser(description=__doc__)
    parser.add_argument('file', type=Path, help='CSV input file to process')
    # file is a Path object you can work with
    parser.add_argument('fields', nargs='+', help='Fieldname(s) to group by')
    parser.add_argument('-o', type=Path, help='Output to a file')
    # o is a boolean
    # one or more fieldname
    args = parser.parse_args()
    if args.o:
        fout = args.o.open('w')
    else:
        fout = sys.stdout
    # now make a csv_writer, wich uses currently sys.stdout
    csv_writer = writer(fout)
    # catching the error, if a field was
    # given which does not exist
    try:
        iterator = group_csv_reader(args.file, args.fields)
    except ValueError as e:
        print(e, file=sys.stderr)
        fout.close()
        return 1
    for row in iterator:
        csv_writer.writerow(row)
    return 0


if __name__ == '__main__':
    sys.exit(main())
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Get an average of the unique values of a column with group by condition and assign it klllmmm 0 220 Feb-17-2024, 05:53 PM
Last Post: klllmmm
  Add a row per group and depending of values Menthix 0 489 Mar-20-2023, 11:20 AM
Last Post: Menthix
  Class-Aggregation and creating a list/dictionary IoannisDem 1 1,883 Oct-03-2021, 05:16 PM
Last Post: Yoriz
  What is the better way of avoiding duplicate records after aggregation in pandas ? jagasrik 0 1,699 Aug-30-2020, 05:26 PM
Last Post: jagasrik
  Using groupby on non-categorical values namy77 1 1,687 Feb-04-2020, 11:58 PM
Last Post: scidam
  Aggregation json by nested elements Omri 1 2,492 Sep-05-2018, 04:45 PM
Last Post: Larz60+
  Diffusion limited aggregation pythonic 0 2,563 Jun-11-2018, 09:45 PM
Last Post: pythonic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020