Python Forum
Add column based on others - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Add column based on others (/thread-33071.html)



Add column based on others - timste - Mar-27-2021

Hi there,

This is a dataframe with glacial lakes and their volume:

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
I would like to create a third column 'LakeID' assigning every value a new ID, which is composed of 'GlacierID' and an extra number according to the number of lakes. Ideally the largest volume gets number 1.

The output should look like this: (My real dataframe is much larger, so I can't do it manually)

Volume GlacierID LakeID
0     50         a     a2
1    100         a     a1
2    100         b     b1
3     90         b     b2
4     60         c     c1
5    200         d     d1
6    150         d     d2
7     80         d     d3
8     70         e     e1
I really don't have any idea how to do this so I cannot include any further coding.

Thank you!


RE: Add column based on others - eddywinch82 - Mar-27-2021

Hi timste,

Try these lines of Code,

For the Dataset you have provided :-

import pandas as pd

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID', 'LakeID'])

dataframe.loc[dataframe['Volume'].str.contains('50') & dataframe['GlacierID'].str.contains('a'), 'LakeID'] = 'a2'

dataframe.loc[dataframe['Volume'].str.contains('100') & dataframe['GlacierID'].str.contains('a|b'), 'LakeID'] = 'a1','b1'

dataframe.loc[dataframe['Volume'].str.contains('90') & dataframe['GlacierID'].str.contains('b'), 'LakeID'] = 'b2'

dataframe.loc[dataframe['Volume'].str.contains('60') & dataframe['GlacierID'].str.contains('c'), 'LakeID'] = 'c1'

dataframe.loc[dataframe['Volume'].str.contains('200') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd1'

dataframe.loc[dataframe['Volume'].str.contains('150') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd2'

dataframe.loc[dataframe['Volume'].str.contains('80') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd3'

dataframe.loc[dataframe['Volume'].str.contains('70') & dataframe['GlacierID'].str.contains('e'), 'LakeID'] = 'e1'

dataframe
Those lines of Code, gives the DataFrame Output you wan't.

I will try to come up with a better way, using np.where Code, because that would take up less lines of Code, than typing many lines like I have, if your Dataset is actually much larger.

Best Regards

Eddie Winch Smile


RE: Add column based on others - timste - Mar-27-2021

Hi Eddie

Thank you for your reply. Unfortunately my real dataframe contains 3600 rows. Here a brief look:

      ObjectID  Vol_Min  Vol_Max  Vol_Range    Vol_Mean     Vol_Std  \
0            1       68       68          0   68.237305    0.000000   
1            2       21       65         44   43.237305   22.143555   
2            3       16      105         89   62.713623   33.093563   
3            4        0      350        350  147.524007  110.684999   
4            5       20       20          0   20.117188    0.000000   
...        ...      ...      ...        ...         ...         ...   
3595      3596       22      332        311  152.536350   72.093350   
3596      3597       76       76          0   76.196289    0.000000   
3597      3598        3        3          0    2.514648    0.000000   
3598      3599       12      183        171   87.805176   58.601078   
3599      3600        2        2          0    2.441406    0.000000   

           Volume  Tiefe_Min  Tiefe_Max  Tiefe_Range  Tiefe_Mean  Tiefe_Std  \
0       68.237305          1          1            0    0.682373   0.000000   
1       86.474609          0          1            0    0.432373   0.221436   
2      250.854492          0          1            1    0.627136   0.330936   
3     3540.576172          0          4            3    1.475240   1.106850   
4       20.117188          0          0            0    0.201172   0.000000   
...           ...        ...        ...          ...         ...        ...   
3595  2745.654297          0          3            3    1.525363   0.720934   
3596    76.196289          1          1            0    0.761963   0.000000   
3597     2.514648          0          0            0    0.025146   0.000000   
3598  1229.272461          0          2            2    0.878052   0.586011   
3599     2.441406          0          0            0    0.024414   0.000000   

        Elevation Glacier_ID  A_sgi2016     h_mean       h_max    V_year  \
0     2335.817871    A50j/13     0.1625  23.590000   46.389999  0.003862   
1     2346.285645    A50j/13     0.1625  23.590000   46.389999  0.003862   
2     2494.893555    A50j/13     0.1625  23.590000   46.389999  0.003862   
3     2734.219727    A50k/04     1.4310  46.259998  132.179993  0.066240   
4     2725.788086    A50k/04     1.4310  46.259998  132.179993  0.066240   
...           ...        ...        ...        ...         ...       ...   
3595  2526.086426     B85/01     0.4153  23.110001   37.660000  0.009572   
3596  2529.571289     B85/01     0.4153  23.110001   37.660000  0.009572   
3597  2627.306152     B85/01     0.4153  23.110001   37.660000  0.009572   
3598  2529.571289     B85/01     0.4153  23.110001   37.660000  0.009572   
3599  2597.728027     B85/01     0.4153  23.110001   37.660000  0.009572   

        V_2016  Shape_Length  Shape_Area       xy  
0     0.003141          40.0       100.0  A50j/13  
1     0.003141          60.0       200.0  A50j/13  
2     0.003141         100.0       400.0  A50j/13  
3     0.056820         260.0      2400.0  A50k/04  
4     0.056820          40.0       100.0  A50k/04  
...        ...           ...         ...      ...  
3595  0.009572         280.0      1800.0   B85/01  
3596  0.009572          40.0       100.0   B85/01  
3597  0.009572          40.0       100.0   B85/01  
3598  0.009572         180.0      1400.0   B85/01  
3599  0.009572          40.0       100.0   B85/01  

[3600 rows x 22 columns]
The two columns regarding the LakeID are column [7] 'Volume' and column [14] 'Glacier_ID'.

Thank you and have a nice weekend!
Tim


RE: Add column based on others - jefsummers - Mar-29-2021

This gets you tags for Lake IDs by combining the glacier and volume. Sorting, reindexing, then getting the glacier and index combined into an ID is another idea.
import pandas as pd

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
df= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
df['LakeID'] = df['GlacierID']+df['Volume']
df=df.sort_values(['GlacierID','Volume'])

df.head(10)



RE: Add column based on others - devesh_sahu - Apr-01-2021

(Mar-27-2021, 10:52 AM)timste Wrote: Hi there,

This is a dataframe with glacial lakes and their volume:

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
I would like to create a third column 'LakeID' assigning every value a new ID, which is composed of 'GlacierID' and an extra number according to the number of lakes. Ideally the largest volume gets number 1.

The output should look like this: (My real dataframe is much larger, so I can't do it manually)

Volume GlacierID LakeID
0     50         a     a2
1    100         a     a1
2    100         b     b1
3     90         b     b2
4     60         c     c1
5    200         d     d1
6    150         d     d2
7     80         d     d3
8     70         e     e1
I really don't have any idea how to do this so I cannot include any further coding.

Thank you!



RE: Add column based on others - devesh_sahu - Apr-01-2021

import pandas as pd
import numpy

data = {'Volume': ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
df = pd.DataFrame(data, columns=['Volume', 'GlacierID'])
df.sort_values(by=['GlacierID', 'Volume'], inplace=True)

seq = []
[seq.extend(seq_no) for seq_no in df.groupby('GlacierID').apply(lambda df: numpy.arange(1, len(df) + 1))]
df['seq'] = seq
df['LakeID'] = df['GlacierID']+df['seq'].astype(str)
df.drop(columns=['seq'], inplace=True)
print(df)

I hope this solution will work. let me know if any assistance require.


RE: Add column based on others - timste - Apr-01-2021

Hi there

Thank you for your idea!
I recently sorted the problem using a for-loop with a empty id column and if current_id == old_id then id = id + 1
and using iloc for the transformation.

Your idea seems more elegant though!

Have a nice easterweeekend!
Tim


RE: Add column based on others - devesh_sahu - Apr-03-2021

(Apr-01-2021, 09:49 AM)timste Wrote: Hi there

Thank you for your idea!
I recently sorted the problem using a for-loop with a empty id column and if current_id == old_id then id = id + 1
and using iloc for the transformation.

Your idea seems more elegant though!

Have a nice easterweeekend!
Tim



RE: Add column based on others - devesh_sahu - Apr-03-2021

appreciated Thumbs Up