Python Forum
Add column based on others
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Add column based on others
#1
Hi there,

This is a dataframe with glacial lakes and their volume:

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
I would like to create a third column 'LakeID' assigning every value a new ID, which is composed of 'GlacierID' and an extra number according to the number of lakes. Ideally the largest volume gets number 1.

The output should look like this: (My real dataframe is much larger, so I can't do it manually)

Volume GlacierID LakeID
0     50         a     a2
1    100         a     a1
2    100         b     b1
3     90         b     b2
4     60         c     c1
5    200         d     d1
6    150         d     d2
7     80         d     d3
8     70         e     e1
I really don't have any idea how to do this so I cannot include any further coding.

Thank you!
Reply
#2
Hi timste,

Try these lines of Code,

For the Dataset you have provided :-

import pandas as pd

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID', 'LakeID'])

dataframe.loc[dataframe['Volume'].str.contains('50') & dataframe['GlacierID'].str.contains('a'), 'LakeID'] = 'a2'

dataframe.loc[dataframe['Volume'].str.contains('100') & dataframe['GlacierID'].str.contains('a|b'), 'LakeID'] = 'a1','b1'

dataframe.loc[dataframe['Volume'].str.contains('90') & dataframe['GlacierID'].str.contains('b'), 'LakeID'] = 'b2'

dataframe.loc[dataframe['Volume'].str.contains('60') & dataframe['GlacierID'].str.contains('c'), 'LakeID'] = 'c1'

dataframe.loc[dataframe['Volume'].str.contains('200') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd1'

dataframe.loc[dataframe['Volume'].str.contains('150') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd2'

dataframe.loc[dataframe['Volume'].str.contains('80') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd3'

dataframe.loc[dataframe['Volume'].str.contains('70') & dataframe['GlacierID'].str.contains('e'), 'LakeID'] = 'e1'

dataframe
Those lines of Code, gives the DataFrame Output you wan't.

I will try to come up with a better way, using np.where Code, because that would take up less lines of Code, than typing many lines like I have, if your Dataset is actually much larger.

Best Regards

Eddie Winch Smile
Reply
#3
Hi Eddie

Thank you for your reply. Unfortunately my real dataframe contains 3600 rows. Here a brief look:

      ObjectID  Vol_Min  Vol_Max  Vol_Range    Vol_Mean     Vol_Std  \
0            1       68       68          0   68.237305    0.000000   
1            2       21       65         44   43.237305   22.143555   
2            3       16      105         89   62.713623   33.093563   
3            4        0      350        350  147.524007  110.684999   
4            5       20       20          0   20.117188    0.000000   
...        ...      ...      ...        ...         ...         ...   
3595      3596       22      332        311  152.536350   72.093350   
3596      3597       76       76          0   76.196289    0.000000   
3597      3598        3        3          0    2.514648    0.000000   
3598      3599       12      183        171   87.805176   58.601078   
3599      3600        2        2          0    2.441406    0.000000   

           Volume  Tiefe_Min  Tiefe_Max  Tiefe_Range  Tiefe_Mean  Tiefe_Std  \
0       68.237305          1          1            0    0.682373   0.000000   
1       86.474609          0          1            0    0.432373   0.221436   
2      250.854492          0          1            1    0.627136   0.330936   
3     3540.576172          0          4            3    1.475240   1.106850   
4       20.117188          0          0            0    0.201172   0.000000   
...           ...        ...        ...          ...         ...        ...   
3595  2745.654297          0          3            3    1.525363   0.720934   
3596    76.196289          1          1            0    0.761963   0.000000   
3597     2.514648          0          0            0    0.025146   0.000000   
3598  1229.272461          0          2            2    0.878052   0.586011   
3599     2.441406          0          0            0    0.024414   0.000000   

        Elevation Glacier_ID  A_sgi2016     h_mean       h_max    V_year  \
0     2335.817871    A50j/13     0.1625  23.590000   46.389999  0.003862   
1     2346.285645    A50j/13     0.1625  23.590000   46.389999  0.003862   
2     2494.893555    A50j/13     0.1625  23.590000   46.389999  0.003862   
3     2734.219727    A50k/04     1.4310  46.259998  132.179993  0.066240   
4     2725.788086    A50k/04     1.4310  46.259998  132.179993  0.066240   
...           ...        ...        ...        ...         ...       ...   
3595  2526.086426     B85/01     0.4153  23.110001   37.660000  0.009572   
3596  2529.571289     B85/01     0.4153  23.110001   37.660000  0.009572   
3597  2627.306152     B85/01     0.4153  23.110001   37.660000  0.009572   
3598  2529.571289     B85/01     0.4153  23.110001   37.660000  0.009572   
3599  2597.728027     B85/01     0.4153  23.110001   37.660000  0.009572   

        V_2016  Shape_Length  Shape_Area       xy  
0     0.003141          40.0       100.0  A50j/13  
1     0.003141          60.0       200.0  A50j/13  
2     0.003141         100.0       400.0  A50j/13  
3     0.056820         260.0      2400.0  A50k/04  
4     0.056820          40.0       100.0  A50k/04  
...        ...           ...         ...      ...  
3595  0.009572         280.0      1800.0   B85/01  
3596  0.009572          40.0       100.0   B85/01  
3597  0.009572          40.0       100.0   B85/01  
3598  0.009572         180.0      1400.0   B85/01  
3599  0.009572          40.0       100.0   B85/01  

[3600 rows x 22 columns]
The two columns regarding the LakeID are column [7] 'Volume' and column [14] 'Glacier_ID'.

Thank you and have a nice weekend!
Tim
Reply
#4
This gets you tags for Lake IDs by combining the glacier and volume. Sorting, reindexing, then getting the glacier and index combined into an ID is another idea.
import pandas as pd

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
df= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
df['LakeID'] = df['GlacierID']+df['Volume']
df=df.sort_values(['GlacierID','Volume'])

df.head(10)
Reply
#5
(Mar-27-2021, 10:52 AM)timste Wrote: Hi there,

This is a dataframe with glacial lakes and their volume:

data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
I would like to create a third column 'LakeID' assigning every value a new ID, which is composed of 'GlacierID' and an extra number according to the number of lakes. Ideally the largest volume gets number 1.

The output should look like this: (My real dataframe is much larger, so I can't do it manually)

Volume GlacierID LakeID
0     50         a     a2
1    100         a     a1
2    100         b     b1
3     90         b     b2
4     60         c     c1
5    200         d     d1
6    150         d     d2
7     80         d     d3
8     70         e     e1
I really don't have any idea how to do this so I cannot include any further coding.

Thank you!
Reply
#6
import pandas as pd
import numpy

data = {'Volume': ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
df = pd.DataFrame(data, columns=['Volume', 'GlacierID'])
df.sort_values(by=['GlacierID', 'Volume'], inplace=True)

seq = []
[seq.extend(seq_no) for seq_no in df.groupby('GlacierID').apply(lambda df: numpy.arange(1, len(df) + 1))]
df['seq'] = seq
df['LakeID'] = df['GlacierID']+df['seq'].astype(str)
df.drop(columns=['seq'], inplace=True)
print(df)

I hope this solution will work. let me know if any assistance require.
Reply
#7
Hi there

Thank you for your idea!
I recently sorted the problem using a for-loop with a empty id column and if current_id == old_id then id = id + 1
and using iloc for the transformation.

Your idea seems more elegant though!

Have a nice easterweeekend!
Tim
Reply
#8
(Apr-01-2021, 09:49 AM)timste Wrote: Hi there

Thank you for your idea!
I recently sorted the problem using a for-loop with a empty id column and if current_id == old_id then id = id + 1
and using iloc for the transformation.

Your idea seems more elegant though!

Have a nice easterweeekend!
Tim
Reply
#9
appreciated Thumbs Up
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Make unique id in vectorized way based on text data column with similarity scoring ill8 0 861 Dec-12-2022, 03:22 AM
Last Post: ill8
  New Dataframe Column Based on Several Conditions nb1214 1 1,783 Nov-16-2021, 10:52 PM
Last Post: jefsummers
  Pandas Data frame column condition check based on length of the value aditi06 1 2,655 Jul-28-2021, 11:08 AM
Last Post: jefsummers
Question [Solved] How to refer to dataframe column name based on a list lorensa74 1 2,238 May-17-2021, 07:02 AM
Last Post: lorensa74
  Extracting rows based on condition on one column Robotguy 2 2,172 Aug-07-2020, 02:27 AM
Last Post: Robotguy
  Filter data based on a value from another dataframe column and create a file using lo pawanmtm 1 4,245 Jul-15-2020, 06:20 PM
Last Post: pawanmtm
  Pandas - Dynamic column aggregation based on another column theroadbacktonature 0 3,011 Apr-17-2020, 04:54 PM
Last Post: theroadbacktonature
Question Dividing a single column of dataframe into multiple columns based on char length darpInd 2 2,418 Mar-14-2020, 09:19 AM
Last Post: scidam
  Sum product multiple Dataframes based on column headers. Lastwizzle 0 3,805 May-21-2019, 04:05 PM
Last Post: Lastwizzle
  Create selection box to pass string value based on uniques in Excel column sneakysnek 1 2,483 Nov-18-2018, 07:29 PM
Last Post: Stefanovietch

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020