Python Forum

Full Version: Question about the groupby function
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, I have the following code:

import itertools
first_letter = lambda x: x[0]
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

for letter, names in itertools.groupby(names, first_letter):
   print(letter, list(names))
The program returned:

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']

Anybody knows why it does not return the result like the following?

A ['Alan', 'Adam', 'Albert']
W ['Wes', 'Will']
S ['Steven']
list needs to be sorted
use:
>>> names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
>>> names.sort()
>>> for letter, inames in itertools.groupby(names, first_letter):
...     print(letter, list(inames))
... 
A ['Adam', 'Alan', 'Albert']
S ['Steven']
W ['Wes', 'Will']
>>>
also, you were overwriting names
result:
(Feb-08-2020, 04:39 AM)new_to_python Wrote: [ -> ]Anybody knows why it does not return the result like the following?

A ['Alan', 'Adam', 'Albert']
W ['Wes', 'Will']
S ['Steven']

In Python interactive interpreter type >>> help(itertools.groupby) (note the part which says : returns consecutive keys and groups from the iterable).


>>> help(itertools.groupby)
class groupby(builtins.object)
 |  groupby(iterable, key=None)
 |  
 |  make an iterator that returns consecutive keys and groups from the iterable
 |  
 |  iterable
 |    Elements to divide into groups according to the key function.
 |  key
 |    A function for computing the group category for each element.
 |    If the key function is not specified or is None, the element itself
 |    is used for grouping.
/.../
Thanks. Yes, I read that. Is the key here "consecutive"? Am I correct that in this example, because 'Albert' and ['Alan', 'Adam'] are separated by ['Wes', 'Will'], it is not "consecutive"/following ['Alan', 'Adam']. As a result, it has its own group?

(Feb-08-2020, 06:06 AM)Larz60+ Wrote: [ -> ]list needs to be sorted
use:
>>> names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
>>> names.sort()
>>> for letter, inames in itertools.groupby(names, first_letter):
...     print(letter, list(inames))
... 
A ['Adam', 'Alan', 'Albert']
S ['Steven']
W ['Wes', 'Will']
>>>
also, you were overwriting names
result:

Thanks. Is it always better to use a variable name different from the first element of groupby to avoid overwriting it?

What is following after "result:"?
reusing names for different objects is dangerous, but allowed.
It's like you had 10 kids all named Pete, some boys and some girls!
(Feb-08-2020, 10:50 PM)Larz60+ Wrote: [ -> ]reusing names for different objects is dangerous, but allowed.
It's like you had 10 kids all named Pete, some boys and some girls!

I will keep that in mind. Thanks.

So am I correct that because 'Albert' and ['Alan', 'Adam'] are separated by ['Wes', 'Will'], 'Albert' is not consecutive to ['Alan', 'Adam'] and as a result, it forms its own group?
yes, that's why i added the sort
(Feb-09-2020, 04:14 AM)new_to_python Wrote: [ -> ]So am I correct that because 'Albert' and ['Alan', 'Adam'] are separated by ['Wes', 'Will'], 'Albert' is not consecutive to ['Alan', 'Adam'] and as a result, it forms its own group?

Python interactive interpreter is excellent tool for observing how stuff 'works':

>>> s = 'aabbc'                                                                                                         
>>> itertools.groupby(s)                                                                                                
<itertools.groupby at 0x1187f9db0>              # groupby object, not very helpful
>>> list(itertools.groupby(s))                  # lets peek inside                                                                            
[('a', <itertools._grouper at 0x118816ac8>),    # groupby object is stream of tuples where:
 ('b', <itertools._grouper at 0x1188168d0>),        - first element is group name
 ('c', <itertools._grouper at 0x118816940>)]        - second element is group itself as grouper object
>>> for key, group in itertools.groupby(s):    # Let's unpack it into human readable format
...     print(f'Group name: {key}, group: {[*group]}')
...
Group name: a, group: ['a', 'a']
Group name: b, group: ['b', 'b']
Group name: c, group: ['c'] 
This is basic functionality which might seen not very helpful. But groupby supports key function which enables to do lot of interesting stuff. Some examples below.

Filter out numbers from user input/string:

>>> user_input = ' a34+ *2'
>>> for key, group in itertools.groupby(user_input, lambda char: char.isdigit()):  # group based on type, key is bool i.e. True or False
...     if key:                                                                    # if group is True
...         print(int(''.join(group))                                              # construct integer from list of strings which are digits
...
34
2
# as list comprehension one-liner:
>>> [int(''.join(group)) for key, group in itertools.groupby(user_input, lambda char: char.isdigit()) if key]           
[34, 2]
Split on many splitters:

>>> text = 'abcdefghijklm'
>>> splitters = ['b','f','j']               # split text on these splitters
>>> list(''.join(group) for key, group in itertools.groupby(text, lambda split: split not in splitters) if key)         
['a', 'cde', 'ghi', 'klm']    
Combining with other itertools functions more 'interesting' code can be written.