Add column based on others - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Add column based on others (/thread-33071.html) |
Add column based on others - timste - Mar-27-2021 Hi there, This is a dataframe with glacial lakes and their volume: data = {'Volume': ['50', '100', '100', '90', '60', '200', '150', '80', '70'], 'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']} dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID'])I would like to create a third column 'LakeID' assigning every value a new ID, which is composed of 'GlacierID' and an extra number according to the number of lakes. Ideally the largest volume gets number 1. The output should look like this: (My real dataframe is much larger, so I can't do it manually) Volume GlacierID LakeID 0 50 a a2 1 100 a a1 2 100 b b1 3 90 b b2 4 60 c c1 5 200 d d1 6 150 d d2 7 80 d d3 8 70 e e1I really don't have any idea how to do this so I cannot include any further coding. Thank you! RE: Add column based on others - eddywinch82 - Mar-27-2021 Hi timste, Try these lines of Code, For the Dataset you have provided :- import pandas as pd data = {'Volume': ['50', '100', '100', '90', '60', '200', '150', '80', '70'], 'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']} dataframe= pd.DataFrame(data, columns = ['Volume', 'GlacierID', 'LakeID']) dataframe.loc[dataframe['Volume'].str.contains('50') & dataframe['GlacierID'].str.contains('a'), 'LakeID'] = 'a2' dataframe.loc[dataframe['Volume'].str.contains('100') & dataframe['GlacierID'].str.contains('a|b'), 'LakeID'] = 'a1','b1' dataframe.loc[dataframe['Volume'].str.contains('90') & dataframe['GlacierID'].str.contains('b'), 'LakeID'] = 'b2' dataframe.loc[dataframe['Volume'].str.contains('60') & dataframe['GlacierID'].str.contains('c'), 'LakeID'] = 'c1' dataframe.loc[dataframe['Volume'].str.contains('200') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd1' dataframe.loc[dataframe['Volume'].str.contains('150') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd2' dataframe.loc[dataframe['Volume'].str.contains('80') & dataframe['GlacierID'].str.contains('d'), 'LakeID'] = 'd3' dataframe.loc[dataframe['Volume'].str.contains('70') & dataframe['GlacierID'].str.contains('e'), 'LakeID'] = 'e1' dataframeThose lines of Code, gives the DataFrame Output you wan't. I will try to come up with a better way, using np.where Code, because that would take up less lines of Code, than typing many lines like I have, if your Dataset is actually much larger. Best Regards Eddie Winch RE: Add column based on others - timste - Mar-27-2021 Hi Eddie Thank you for your reply. Unfortunately my real dataframe contains 3600 rows. Here a brief look: ObjectID Vol_Min Vol_Max Vol_Range Vol_Mean Vol_Std \ 0 1 68 68 0 68.237305 0.000000 1 2 21 65 44 43.237305 22.143555 2 3 16 105 89 62.713623 33.093563 3 4 0 350 350 147.524007 110.684999 4 5 20 20 0 20.117188 0.000000 ... ... ... ... ... ... ... 3595 3596 22 332 311 152.536350 72.093350 3596 3597 76 76 0 76.196289 0.000000 3597 3598 3 3 0 2.514648 0.000000 3598 3599 12 183 171 87.805176 58.601078 3599 3600 2 2 0 2.441406 0.000000 Volume Tiefe_Min Tiefe_Max Tiefe_Range Tiefe_Mean Tiefe_Std \ 0 68.237305 1 1 0 0.682373 0.000000 1 86.474609 0 1 0 0.432373 0.221436 2 250.854492 0 1 1 0.627136 0.330936 3 3540.576172 0 4 3 1.475240 1.106850 4 20.117188 0 0 0 0.201172 0.000000 ... ... ... ... ... ... ... 3595 2745.654297 0 3 3 1.525363 0.720934 3596 76.196289 1 1 0 0.761963 0.000000 3597 2.514648 0 0 0 0.025146 0.000000 3598 1229.272461 0 2 2 0.878052 0.586011 3599 2.441406 0 0 0 0.024414 0.000000 Elevation Glacier_ID A_sgi2016 h_mean h_max V_year \ 0 2335.817871 A50j/13 0.1625 23.590000 46.389999 0.003862 1 2346.285645 A50j/13 0.1625 23.590000 46.389999 0.003862 2 2494.893555 A50j/13 0.1625 23.590000 46.389999 0.003862 3 2734.219727 A50k/04 1.4310 46.259998 132.179993 0.066240 4 2725.788086 A50k/04 1.4310 46.259998 132.179993 0.066240 ... ... ... ... ... ... ... 3595 2526.086426 B85/01 0.4153 23.110001 37.660000 0.009572 3596 2529.571289 B85/01 0.4153 23.110001 37.660000 0.009572 3597 2627.306152 B85/01 0.4153 23.110001 37.660000 0.009572 3598 2529.571289 B85/01 0.4153 23.110001 37.660000 0.009572 3599 2597.728027 B85/01 0.4153 23.110001 37.660000 0.009572 V_2016 Shape_Length Shape_Area xy 0 0.003141 40.0 100.0 A50j/13 1 0.003141 60.0 200.0 A50j/13 2 0.003141 100.0 400.0 A50j/13 3 0.056820 260.0 2400.0 A50k/04 4 0.056820 40.0 100.0 A50k/04 ... ... ... ... ... 3595 0.009572 280.0 1800.0 B85/01 3596 0.009572 40.0 100.0 B85/01 3597 0.009572 40.0 100.0 B85/01 3598 0.009572 180.0 1400.0 B85/01 3599 0.009572 40.0 100.0 B85/01 [3600 rows x 22 columns]The two columns regarding the LakeID are column [7] 'Volume' and column [14] 'Glacier_ID'. Thank you and have a nice weekend! Tim RE: Add column based on others - jefsummers - Mar-29-2021 This gets you tags for Lake IDs by combining the glacier and volume. Sorting, reindexing, then getting the glacier and index combined into an ID is another idea. import pandas as pd data = {'Volume': ['50', '100', '100', '90', '60', '200', '150', '80', '70'], 'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']} df= pd.DataFrame(data, columns = ['Volume', 'GlacierID']) df['LakeID'] = df['GlacierID']+df['Volume'] df=df.sort_values(['GlacierID','Volume']) df.head(10) RE: Add column based on others - devesh_sahu - Apr-01-2021 (Mar-27-2021, 10:52 AM)timste Wrote: Hi there, RE: Add column based on others - devesh_sahu - Apr-01-2021 import pandas as pd import numpy data = {'Volume': ['50', '100', '100', '90', '60', '200', '150', '80', '70'], 'GlacierID': ['a', 'a', 'b', 'b', 'c', 'd', 'd', 'd', 'e']} df = pd.DataFrame(data, columns=['Volume', 'GlacierID']) df.sort_values(by=['GlacierID', 'Volume'], inplace=True) seq = [] [seq.extend(seq_no) for seq_no in df.groupby('GlacierID').apply(lambda df: numpy.arange(1, len(df) + 1))] df['seq'] = seq df['LakeID'] = df['GlacierID']+df['seq'].astype(str) df.drop(columns=['seq'], inplace=True) print(df) I hope this solution will work. let me know if any assistance require. RE: Add column based on others - timste - Apr-01-2021 Hi there Thank you for your idea! I recently sorted the problem using a for-loop with a empty id column and if current_id == old_id then id = id + 1 and using iloc for the transformation. Your idea seems more elegant though! Have a nice easterweeekend! Tim RE: Add column based on others - devesh_sahu - Apr-03-2021 (Apr-01-2021, 09:49 AM)timste Wrote: Hi there RE: Add column based on others - devesh_sahu - Apr-03-2021 appreciated |