groupby cast to int where possible - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: groupby cast to int where possible (/thread-34151.html) |
groupby cast to int where possible - mikisDeWitte - Jul-01-2021 Hello, I need to create a nested dictionary like so: d= {"ftsg2a_co422": {1: 'visited or contacted', 0: 'not visited not contacted'}} this dictionary then goes to a function that sets value labels to an SPSS file. The data comes from a database where these 0's and 1's can be a string. When I execute the code below, I get '1': 'promotor' Is there a way where I can modify this groupby, cast the value to integer where possible, so that I get 1 : promotor 'rgDEU': 'Germany' etc? I hope my question makes sense... Regards, Mikis import pandas as pd #So all of this is read in from a database: var= ["ftq10a_bra1", "ftq10a_bra1", "ftq10a_bra1", "ftq10a_bra2", "ftq10a_bra2", "ftq10a_bra2", "d1ccode", "d1ccode"] value= ["1", "2", "3", "1", "2", "3", "rgDEU", "rgFRA"] label= ["promotor", "passive", "detractor", "promotor", "passive", "detractor", "Germany", "France"] data_tuples = list(zip(var,value, label)) column_labels = pd.DataFrame(data_tuples, columns=['var','value', 'label']) #This is what I need to modify, so that "1" becomes 1 d = {k: f.groupby('value')['label'].apply(lambda x: x.iloc[0]).to_dict() for k, f in column_labels.groupby('var')} RE: groupby cast to int where possible - mikisDeWitte - Jul-08-2021 In case it helps anyone, I found a workaround for my issue. Basically, I try to cast the column to integers, using errors="coerce". If it can't be interpreted as a number, it converts it to nan. I can then test if it's nan (and thus string) using pd.isnull() This series of true/false is then used to split the dataframe in two. in the dataframe where it can be converted ,I cast the column to integers. Finally, I apply my dictionary formula on both dataframes seperatly and combine both results. It ain't pretty, but it works import pandas as pd #So all of this is read in from a database: var= ["ftq10a_bra1", "ftq10a_bra1", "ftq10a_bra1", "ftq10a_bra2", "ftq10a_bra2", "ftq10a_bra2", "d1ccode", "d1ccode"] value= ["1", "2", "3", "1", "2", "3", "rgDEU", "rgFRA"] label= ["promotor", "passive", "detractor", "promotor", "passive", "detractor", "Germany", "France"] data_tuples = list(zip(var,value, label)) val_labels = pd.DataFrame(data_tuples, columns=['var','value', 'label']) #Filter out everything that can not be cast to numeric is_string = pd.isnull(pd.to_numeric(val_labels['value'], errors='coerce')) string_labels = val_labels[is_string] numeric_labels = val_labels[~is_string] #cast the ones that can be converted to an integer numeric_labels = numeric_labels.astype({"value": int}) variable_value_labels = {k: f.groupby('value')['label'].apply(lambda x: x.iloc[0]).to_dict() for k, f in string_labels.groupby('var')} d2 = {k: f.groupby('value')['label'].apply(lambda x: x.iloc[0]).to_dict() for k, f in numeric_labels.groupby('var')} #combine them again variable_value_labels.update(d2) RE: groupby cast to int where possible - deanhystad - Jul-08-2021 If a value can be an int or a str I would cast to str unless it is going to be used as an int. |