Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Grouping Integers
#1
Hi

I am trying to look through a data set for a particular string name "s" and create another column with the value y1, y2, y3, etc for each cluster of "s". I will then create a new data set by summing the time for y1, y2, y3, etc. I am new to Python so I am struggling on how to start. I feel like itertools might be what I need. I appreciate any help.


Time (min) String New Column
2 s y1
4 s y1
2 s y1
3 x
3 x
5 x
6 s y2
2 s y2
4 s y2
Reply
#2
Itertools.groupby can help you, but it is recommended to look at Pandas package,
which is extremely powerful in data grouping/filtering.
Reply
#3
Thanks scidam

I am using pandas but I can't find how to group them the way that I want to.

It seems that I should be using groupby. Is that correct?
Reply
#4
Here is an example of the data set. I am trying to find these clusters where 'Unknown' and 'Out of Hole' are adjacent rows and create a new column with a value as group1, group2, group3, etc. for each group it finds.

YYYY/MM/DD hh:mm:ss rig_sub_state_unitless rig_super_state_unitless
2018/11/28 06:00:00 Unknown Out of Hole
2018/11/28 06:00:01 Unknown Out of Hole
2018/11/28 06:00:02 Unknown Out of Hole
2018/11/28 06:00:03 Unknown Out of Hole
2018/11/28 06:00:04 Unknown Out of Hole
Reply
#5
Maybe this could help you

import pandas as pd

data = {
    'string': ['s', 'x', 's', 's', 's', 'x', 'x', 'x', 's', 'x'],
    'num': [1,2,3,1,2,3,1,2,3,1]
}

df = pd.DataFrame.from_dict(data)

count = 0

for idx, x in enumerate(df['string']):
    if (x == 's'):
        df.loc[idx, 'new_col'] = 'y'+str(count)
        if(df.loc[idx+1, 'string'] != 's'): 
            count = count+1
I iterate over the df['string'] series looking for a match (in this case 's') and look for the value on the next index, if it does not match with 's' then sum 1 to count.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping Data based on 30% bracket purnima1 0 954 Feb-16-2023, 07:14 PM
Last Post: purnima1
  sns point plot grouping Mekala 0 1,502 Jul-24-2020, 04:06 PM
Last Post: Mekala
  Counting and grouping Kudzo 0 1,342 Jan-27-2020, 01:30 AM
Last Post: Kudzo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020