Python Forum

Hi

I am trying to look through a data set for a particular string name "s" and create another column with the value y1, y2, y3, etc for each cluster of "s". I will then create a new data set by summing the time for y1, y2, y3, etc. I am new to Python so I am struggling on how to start. I feel like itertools might be what I need. I appreciate any help.

Time (min) String New Column
2 s y1
4 s y1
2 s y1
3 x
3 x
5 x
6 s y2
2 s y2
4 s y2

Itertools.groupby can help you, but it is recommended to look at Pandas package,
which is extremely powerful in data grouping/filtering.

Thanks scidam

I am using pandas but I can't find how to group them the way that I want to.

It seems that I should be using groupby. Is that correct?

Here is an example of the data set. I am trying to find these clusters where 'Unknown' and 'Out of Hole' are adjacent rows and create a new column with a value as group1, group2, group3, etc. for each group it finds.

YYYY/MM/DD hh:mm:ss rig_sub_state_unitless rig_super_state_unitless
2018/11/28 06:00:00 Unknown Out of Hole
2018/11/28 06:00:01 Unknown Out of Hole
2018/11/28 06:00:02 Unknown Out of Hole
2018/11/28 06:00:03 Unknown Out of Hole
2018/11/28 06:00:04 Unknown Out of Hole

Maybe this could help you

import pandas as pd

data = {
    'string': ['s', 'x', 's', 's', 's', 'x', 'x', 'x', 's', 'x'],
    'num': [1,2,3,1,2,3,1,2,3,1]
}

df = pd.DataFrame.from_dict(data)

count = 0

for idx, x in enumerate(df['string']):
    if (x == 's'):
        df.loc[idx, 'new_col'] = 'y'+str(count)
        if(df.loc[idx+1, 'string'] != 's'): 
            count = count+1

I iterate over the df['string'] series looking for a match (in this case 's') and look for the value on the next index, if it does not match with 's' then sum 1 to count.

mjack24

scidam

mjack24

mjack24

FranSPG