Python Forum
Add column based on others
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Add column based on others
#1
Hi there,

This is a dataframe with glacial lakes and their volume:

1
2
3
data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
I would like to create a third column 'LakeID' assigning every value a new ID, which is composed of 'GlacierID' and an extra number according to the number of lakes. Ideally the largest volume gets number 1.

The output should look like this: (My real dataframe is much larger, so I can't do it manually)

1
2
3
4
5
6
7
8
9
10
Volume GlacierID LakeID
0     50         a     a2
1    100         a     a1
2    100         b     b1
3     90         b     b2
4     60         c     c1
5    200         d     d1
6    150         d     d2
7     80         d     d3
8     70         e     e1
I really don't have any idea how to do this so I cannot include any further coding.

Thank you!
Reply
#2
Hi timste,

Try these lines of Code,

For the Dataset you have provided :-

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import pandas as pd
 
data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID', 'LakeID'])
 
dataframe.loc[dataframe['Volume'].str.contains('50') & dataframe['GlacierID'].str.contains('a'), 'LakeID'] = 'a2'
 
dataframe.loc[dataframe['Volume'].str.contains('100') & dataframe['GlacierID'].str.contains('a|b'), 'LakeID'] = 'a1','b1'
 
dataframe.loc[dataframe['Volume'].str.contains('90') & dataframe['GlacierID'].str.contains('b'), 'LakeID'] = 'b2'
 
dataframe.loc[dataframe['Volume'].str.contains('60') & dataframe['GlacierID'].str.contains('c'), 'LakeID'] = 'c1'
 
dataframe.loc[dataframe['Volume'].str.contains('200') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd1'
 
dataframe.loc[dataframe['Volume'].str.contains('150') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd2'
 
dataframe.loc[dataframe['Volume'].str.contains('80') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd3'
 
dataframe.loc[dataframe['Volume'].str.contains('70') & dataframe['GlacierID'].str.contains('e'), 'LakeID'] = 'e1'
 
dataframe
Those lines of Code, gives the DataFrame Output you wan't.

I will try to come up with a better way, using np.where Code, because that would take up less lines of Code, than typing many lines like I have, if your Dataset is actually much larger.

Best Regards

Eddie Winch Smile
Reply
#3
Hi Eddie

Thank you for your reply. Unfortunately my real dataframe contains 3600 rows. Here a brief look:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
      ObjectID  Vol_Min  Vol_Max  Vol_Range    Vol_Mean     Vol_Std  \
0            1       68       68          0   68.237305    0.000000  
1            2       21       65         44   43.237305   22.143555  
2            3       16      105         89   62.713623   33.093563  
3            4        0      350        350  147.524007  110.684999  
4            5       20       20          0   20.117188    0.000000  
...        ...      ...      ...        ...         ...         ...  
3595      3596       22      332        311  152.536350   72.093350  
3596      3597       76       76          0   76.196289    0.000000  
3597      3598        3        3          0    2.514648    0.000000  
3598      3599       12      183        171   87.805176   58.601078  
3599      3600        2        2          0    2.441406    0.000000  
 
           Volume  Tiefe_Min  Tiefe_Max  Tiefe_Range  Tiefe_Mean  Tiefe_Std  \
0       68.237305          1          1            0    0.682373   0.000000  
1       86.474609          0          1            0    0.432373   0.221436  
2      250.854492          0          1            1    0.627136   0.330936  
3     3540.576172          0          4            3    1.475240   1.106850  
4       20.117188          0          0            0    0.201172   0.000000  
...           ...        ...        ...          ...         ...        ...  
3595  2745.654297          0          3            3    1.525363   0.720934  
3596    76.196289          1          1            0    0.761963   0.000000  
3597     2.514648          0          0            0    0.025146   0.000000  
3598  1229.272461          0          2            2    0.878052   0.586011  
3599     2.441406          0          0            0    0.024414   0.000000  
 
        Elevation Glacier_ID  A_sgi2016     h_mean       h_max    V_year  \
0     2335.817871    A50j/13     0.1625  23.590000   46.389999  0.003862  
1     2346.285645    A50j/13     0.1625  23.590000   46.389999  0.003862  
2     2494.893555    A50j/13     0.1625  23.590000   46.389999  0.003862  
3     2734.219727    A50k/04     1.4310  46.259998  132.179993  0.066240  
4     2725.788086    A50k/04     1.4310  46.259998  132.179993  0.066240  
...           ...        ...        ...        ...         ...       ...  
3595  2526.086426     B85/01     0.4153  23.110001   37.660000  0.009572  
3596  2529.571289     B85/01     0.4153  23.110001   37.660000  0.009572  
3597  2627.306152     B85/01     0.4153  23.110001   37.660000  0.009572  
3598  2529.571289     B85/01     0.4153  23.110001   37.660000  0.009572  
3599  2597.728027     B85/01     0.4153  23.110001   37.660000  0.009572  
 
        V_2016  Shape_Length  Shape_Area       xy 
0     0.003141          40.0       100.0  A50j/13 
1     0.003141          60.0       200.0  A50j/13 
2     0.003141         100.0       400.0  A50j/13 
3     0.056820         260.0      2400.0  A50k/04 
4     0.056820          40.0       100.0  A50k/04 
...        ...           ...         ...      ... 
3595  0.009572         280.0      1800.0   B85/01 
3596  0.009572          40.0       100.0   B85/01 
3597  0.009572          40.0       100.0   B85/01 
3598  0.009572         180.0      1400.0   B85/01 
3599  0.009572          40.0       100.0   B85/01 
 
[3600 rows x 22 columns]
The two columns regarding the LakeID are column [7] 'Volume' and column [14] 'Glacier_ID'.

Thank you and have a nice weekend!
Tim
Reply
#4
This gets you tags for Lake IDs by combining the glacier and volume. Sorting, reindexing, then getting the glacier and index combined into an ID is another idea.
1
2
3
4
5
6
7
8
9
import pandas as pd
 
data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
df= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
df['LakeID'] = df['GlacierID']+df['Volume']
df=df.sort_values(['GlacierID','Volume'])
 
df.head(10)
Reply
#5
(Mar-27-2021, 10:52 AM)timste Wrote: Hi there,

This is a dataframe with glacial lakes and their volume:

1
2
3
data = {'Volume':  ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
        'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])
I would like to create a third column 'LakeID' assigning every value a new ID, which is composed of 'GlacierID' and an extra number according to the number of lakes. Ideally the largest volume gets number 1.

The output should look like this: (My real dataframe is much larger, so I can't do it manually)

1
2
3
4
5
6
7
8
9
10
Volume GlacierID LakeID
0     50         a     a2
1    100         a     a1
2    100         b     b1
3     90         b     b2
4     60         c     c1
5    200         d     d1
6    150         d     d2
7     80         d     d3
8     70         e     e1
I really don't have any idea how to do this so I cannot include any further coding.

Thank you!
Reply
#6
import pandas as pd
import numpy

data = {'Volume': ['50', '100', '100', '90', '60', '200', '150', '80', '70'],
'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']}
df = pd.DataFrame(data, columns=['Volume', 'GlacierID'])
df.sort_values(by=['GlacierID', 'Volume'], inplace=True)

seq = []
[seq.extend(seq_no) for seq_no in df.groupby('GlacierID').apply(lambda df: numpy.arange(1, len(df) + 1))]
df['seq'] = seq
df['LakeID'] = df['GlacierID']+df['seq'].astype(str)
df.drop(columns=['seq'], inplace=True)
print(df)

I hope this solution will work. let me know if any assistance require.
Reply
#7
Hi there

Thank you for your idea!
I recently sorted the problem using a for-loop with a empty id column and if current_id == old_id then id = id + 1
and using iloc for the transformation.

Your idea seems more elegant though!

Have a nice easterweeekend!
Tim
Reply
#8
(Apr-01-2021, 09:49 AM)timste Wrote: Hi there

Thank you for your idea!
I recently sorted the problem using a for-loop with a empty id column and if current_id == old_id then id = id + 1
and using iloc for the transformation.

Your idea seems more elegant though!

Have a nice easterweeekend!
Tim
Reply
#9
appreciated Thumbs Up
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Make unique id in vectorized way based on text data column with similarity scoring ill8 0 1,474 Dec-12-2022, 03:22 AM
Last Post: ill8
  New Dataframe Column Based on Several Conditions nb1214 1 2,549 Nov-16-2021, 10:52 PM
Last Post: jefsummers
  Pandas Data frame column condition check based on length of the value aditi06 1 3,805 Jul-28-2021, 11:08 AM
Last Post: jefsummers
Question [Solved] How to refer to dataframe column name based on a list lorensa74 1 3,031 May-17-2021, 07:02 AM
Last Post: lorensa74
  Extracting rows based on condition on one column Robotguy 2 3,997 Aug-07-2020, 02:27 AM
Last Post: Robotguy
  Filter data based on a value from another dataframe column and create a file using lo pawanmtm 1 5,209 Jul-15-2020, 06:20 PM
Last Post: pawanmtm
  Pandas - Dynamic column aggregation based on another column theroadbacktonature 0 3,787 Apr-17-2020, 04:54 PM
Last Post: theroadbacktonature
Question Dividing a single column of dataframe into multiple columns based on char length darpInd 2 3,284 Mar-14-2020, 09:19 AM
Last Post: scidam
  Sum product multiple Dataframes based on column headers. Lastwizzle 0 4,611 May-21-2019, 04:05 PM
Last Post: Lastwizzle
  Create selection box to pass string value based on uniques in Excel column sneakysnek 1 3,104 Nov-18-2018, 07:29 PM
Last Post: Stefanovietch

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020