Add column based on others

timste · Mar-27-2021, 10:52 AM

Hi there,

This is a dataframe with glacial lakes and their volume:

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])

I would like to create a third column 'LakeID' assigning every value a new ID, which is composed of 'GlacierID' and an extra number according to the number of lakes. Ideally the largest volume gets number 1.

The output should look like this: (My real dataframe is much larger, so I can't do it manually)

Volume GlacierID LakeID
0     50         a     a2
1    100         a     a1
2    100         b     b1
3     90         b     b2
4     60         c     c1
5    200         d     d1
6    150         d     d2
7     80         d     d3
8     70         e     e1

I really don't have any idea how to do this so I cannot include any further coding.

Thank you!

eddywinch82 · (This post was last modified: Mar-27-2021, 09:05 PM by eddywinch82.)

Hi timste,

Try these lines of Code,

For the Dataset you have provided :-

import pandas as pd

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID', 'LakeID'])

dataframe.loc[dataframe['Volume'].str.contains('50') & dataframe['GlacierID'].str.contains('a'), 'LakeID'] = 'a2'

dataframe.loc[dataframe['Volume'].str.contains('100') & dataframe['GlacierID'].str.contains('a|b'), 'LakeID'] = 'a1','b1'

dataframe.loc[dataframe['Volume'].str.contains('90') & dataframe['GlacierID'].str.contains('b'), 'LakeID'] = 'b2'

dataframe.loc[dataframe['Volume'].str.contains('60') & dataframe['GlacierID'].str.contains('c'), 'LakeID'] = 'c1'

dataframe.loc[dataframe['Volume'].str.contains('200') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd1'

dataframe.loc[dataframe['Volume'].str.contains('150') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd2'

dataframe.loc[dataframe['Volume'].str.contains('80') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd3'

dataframe.loc[dataframe['Volume'].str.contains('70') & dataframe['GlacierID'].str.contains('e'), 'LakeID'] = 'e1'

dataframe

Those lines of Code, gives the DataFrame Output you wan't.

I will try to come up with a better way, using np.where Code, because that would take up less lines of Code, than typing many lines like I have, if your Dataset is actually much larger.

Best Regards

Eddie Winch Smile

timste · Mar-27-2021, 09:52 PM

Hi Eddie

Thank you for your reply. Unfortunately my real dataframe contains 3600 rows. Here a brief look:

      ObjectID  Vol_Min  Vol_Max  Vol_Range    Vol_Mean     Vol_Std  \
0            1       68       68          0   68.237305    0.000000   
1            2       21       65         44   43.237305   22.143555   
2            3       16      105         89   62.713623   33.093563   
3            4        0      350        350  147.524007  110.684999   
4            5       20       20          0   20.117188    0.000000   
...        ...      ...      ...        ...         ...         ...   
3595      3596       22      332        311  152.536350   72.093350   
3596      3597       76       76          0   76.196289    0.000000   
3597      3598        3        3          0    2.514648    0.000000   
3598      3599       12      183        171   87.805176   58.601078   
3599      3600        2        2          0    2.441406    0.000000   

           Volume  Tiefe_Min  Tiefe_Max  Tiefe_Range  Tiefe_Mean  Tiefe_Std  \
0       68.237305          1          1            0    0.682373   0.000000   
1       86.474609          0          1            0    0.432373   0.221436   
2      250.854492          0          1            1    0.627136   0.330936   
3     3540.576172          0          4            3    1.475240   1.106850   
4       20.117188          0          0            0    0.201172   0.000000   
...           ...        ...        ...          ...         ...        ...   
3595  2745.654297          0          3            3    1.525363   0.720934   
3596    76.196289          1          1            0    0.761963   0.000000   
3597     2.514648          0          0            0    0.025146   0.000000   
3598  1229.272461          0          2            2    0.878052   0.586011   
3599     2.441406          0          0            0    0.024414   0.000000   

        Elevation Glacier_ID  A_sgi2016     h_mean       h_max    V_year  \
0     2335.817871    A50j/13     0.1625  23.590000   46.389999  0.003862   
1     2346.285645    A50j/13     0.1625  23.590000   46.389999  0.003862   
2     2494.893555    A50j/13     0.1625  23.590000   46.389999  0.003862   
3     2734.219727    A50k/04     1.4310  46.259998  132.179993  0.066240   
4     2725.788086    A50k/04     1.4310  46.259998  132.179993  0.066240   
...           ...        ...        ...        ...         ...       ...   
3595  2526.086426     B85/01     0.4153  23.110001   37.660000  0.009572   
3596  2529.571289     B85/01     0.4153  23.110001   37.660000  0.009572   
3597  2627.306152     B85/01     0.4153  23.110001   37.660000  0.009572   
3598  2529.571289     B85/01     0.4153  23.110001   37.660000  0.009572   
3599  2597.728027     B85/01     0.4153  23.110001   37.660000  0.009572   

        V_2016  Shape_Length  Shape_Area       xy  
0     0.003141          40.0       100.0  A50j/13  
1     0.003141          60.0       200.0  A50j/13  
2     0.003141         100.0       400.0  A50j/13  
3     0.056820         260.0      2400.0  A50k/04  
4     0.056820          40.0       100.0  A50k/04  
...        ...           ...         ...      ...  
3595  0.009572         280.0      1800.0   B85/01  
3596  0.009572          40.0       100.0   B85/01  
3597  0.009572          40.0       100.0   B85/01  
3598  0.009572         180.0      1400.0   B85/01  
3599  0.009572          40.0       100.0   B85/01  

[3600 rows x 22 columns]

The two columns regarding the LakeID are column [7] 'Volume' and column [14] 'Glacier_ID'.

Thank you and have a nice weekend!
Tim

jefsummers · Mar-29-2021, 12:17 AM

This gets you tags for Lake IDs by combining the glacier and volume. Sorting, reindexing, then getting the glacier and index combined into an ID is another idea.

import pandas as pd

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
df= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
df['LakeID'] = df['GlacierID']+df['Volume']
df=df.sort_values(['GlacierID','Volume'])

df.head(10)

devesh_sahu · Apr-01-2021, 09:20 AM

(Mar-27-2021, 10:52 AM)timste Wrote: Hi there,

This is a dataframe with glacial lakes and their volume:
data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
I would like to create a third column 'LakeID' assigning every value a new ID, which is composed of 'GlacierID' and an extra number according to the number of lakes. Ideally the largest volume gets number 1.

The output should look like this: (My real dataframe is much larger, so I can't do it manually)
Volume GlacierID LakeID
0     50         a     a2
1    100         a     a1
2    100         b     b1
3     90         b     b2
4     60         c     c1
5    200         d     d1
6    150         d     d2
7     80         d     d3
8     70         e     e1
I really don't have any idea how to do this so I cannot include any further coding.

Thank you!

devesh_sahu · Apr-01-2021, 09:24 AM

import pandas as pd
import numpy

data = {'Volume': ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
df = pd.DataFrame(data, columns=['Volume', 'GlacierID'])
df.sort_values(by=['GlacierID', 'Volume'], inplace=True)

seq = []
[seq.extend(seq_no) for seq_no in df.groupby('GlacierID').apply(lambda df: numpy.arange(1, len(df) + 1))]
df['seq'] = seq
df['LakeID'] = df['GlacierID']+df['seq'].astype(str)
df.drop(columns=['seq'], inplace=True)
print(df)

I hope this solution will work. let me know if any assistance require.

timste · Apr-01-2021, 09:49 AM

Hi there

Thank you for your idea!
I recently sorted the problem using a for-loop with a empty id column and if current_id == old_id then id = id + 1
and using iloc for the transformation.

Your idea seems more elegant though!

Have a nice easterweeekend!
Tim

devesh_sahu · Apr-03-2021, 07:37 AM

(Apr-01-2021, 09:49 AM)timste Wrote: Hi there

Thank you for your idea!
I recently sorted the problem using a for-loop with a empty id column and if current_id == old_id then id = id + 1
and using iloc for the transformation.

Your idea seems more elegant though!

Have a nice easterweeekend!
Tim

devesh_sahu · Apr-03-2021, 07:39 AM

appreciated Thumbs Up

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Make unique id in vectorized way based on text data column with similarity scoring	ill8	0	1,462	Dec-12-2022, 03:22 AM Last Post: ill8
	New Dataframe Column Based on Several Conditions	nb1214	1	2,531	Nov-16-2021, 10:52 PM Last Post: jefsummers
	Pandas Data frame column condition check based on length of the value	aditi06	1	3,779	Jul-28-2021, 11:08 AM Last Post: jefsummers
	[Solved] How to refer to dataframe column name based on a list	lorensa74	1	3,016	May-17-2021, 07:02 AM Last Post: lorensa74
	Extracting rows based on condition on one column	Robotguy	2	3,958	Aug-07-2020, 02:27 AM Last Post: Robotguy
	Filter data based on a value from another dataframe column and create a file using lo	pawanmtm	1	5,184	Jul-15-2020, 06:20 PM Last Post: pawanmtm
	Pandas - Dynamic column aggregation based on another column	theroadbacktonature	0	3,764	Apr-17-2020, 04:54 PM Last Post: theroadbacktonature
	Dividing a single column of dataframe into multiple columns based on char length	darpInd	2	3,263	Mar-14-2020, 09:19 AM Last Post: scidam
	Sum product multiple Dataframes based on column headers.	Lastwizzle	0	4,587	May-21-2019, 04:05 PM Last Post: Lastwizzle
	Create selection box to pass string value based on uniques in Excel column	sneakysnek	1	3,090	Nov-18-2018, 07:29 PM Last Post: Stefanovietch

Add column based on others

User Panel Messages

Announcements