Python Forum
groupby cast to int where possible
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
groupby cast to int where possible
#1
Hello,

I need to create a nested dictionary like so:
d= {"ftsg2a_co422": {1: 'visited or contacted', 0: 'not visited not contacted'}}

this dictionary then goes to a function that sets value labels to an SPSS file.

The data comes from a database where these 0's and 1's can be a string.
When I execute the code below, I get
'1': 'promotor'

Is there a way where I can modify this groupby, cast the value to integer where possible, so that I get
1 : promotor
'rgDEU': 'Germany'
etc?

I hope my question makes sense...

Regards,
Mikis

import pandas as pd
#So all of this is read in from a database:
var= ["ftq10a_bra1", "ftq10a_bra1", "ftq10a_bra1", "ftq10a_bra2", "ftq10a_bra2", "ftq10a_bra2", "d1ccode", "d1ccode"]
value= ["1", "2", "3", "1", "2", "3", "rgDEU", "rgFRA"]
label= ["promotor", "passive", "detractor", "promotor", "passive", "detractor", "Germany", "France"]
data_tuples = list(zip(var,value, label))
column_labels = pd.DataFrame(data_tuples, columns=['var','value', 'label']) 
   
#This is what I need to modify, so that "1" becomes 1
d = {k: f.groupby('value')['label'].apply(lambda x: x.iloc[0]).to_dict()
     for k, f in column_labels.groupby('var')}
Reply
#2
In case it helps anyone, I found a workaround for my issue.
Basically, I try to cast the column to integers, using errors="coerce".
If it can't be interpreted as a number, it converts it to nan.
I can then test if it's nan (and thus string) using pd.isnull()
This series of true/false is then used to split the dataframe in two. in the dataframe where it can be converted ,I cast the column to integers.
Finally, I apply my dictionary formula on both dataframes seperatly and combine both results.
It ain't pretty, but it works Big Grin

import pandas as pd
#So all of this is read in from a database:
var= ["ftq10a_bra1", "ftq10a_bra1", "ftq10a_bra1", "ftq10a_bra2", "ftq10a_bra2", "ftq10a_bra2", "d1ccode", "d1ccode"]
value= ["1", "2", "3", "1", "2", "3", "rgDEU", "rgFRA"]
label= ["promotor", "passive", "detractor", "promotor", "passive", "detractor", "Germany", "France"]
data_tuples = list(zip(var,value, label))
val_labels = pd.DataFrame(data_tuples, columns=['var','value', 'label']) 
    
#Filter out everything that can not be cast to numeric
is_string = pd.isnull(pd.to_numeric(val_labels['value'], errors='coerce'))
string_labels = val_labels[is_string]
numeric_labels = val_labels[~is_string]
#cast the ones that can be converted to an integer
numeric_labels = numeric_labels.astype({"value": int})

variable_value_labels = {k: f.groupby('value')['label'].apply(lambda x: x.iloc[0]).to_dict()
     for k, f in string_labels.groupby('var')}
d2 = {k: f.groupby('value')['label'].apply(lambda x: x.iloc[0]).to_dict()
     for k, f in numeric_labels.groupby('var')}
#combine them again
variable_value_labels.update(d2)
Reply
#3
If a value can be an int or a str I would cast to str unless it is going to be used as an int.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020