Python Forum

Full Version: value_counts method question
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I hope you are all having a good day. I am currently teaching myself Python through an online MOOC. I have a question about the following line of code:

test=list()
for num in sal['JobTitle']:
    if 'chief' in num.lower():
        sum1=True
    else:
        sum1=False
        test.append(sum1)
test2=pd.DataFrame(test)
test2['sum1'].value_counts()
I am trying to see the number of True and False responses through the value_counts method. However, I receive an error message stating "an integer is required." I am not looking for an entirely new way of obtaining this answer, rather, I am trying to figure out how to go about this method and obtain the solution. I really want to learn all the intricacies of Python.

Moderator: Please use code tags in the future, so code is easier to read.
- nilamo
Instead of True/False, does it work with 1/0? That might just be pandas refusing to add boolean values together implicitly.
Pandas has no problems to count values in boolean columns, but there are some problems with your code:

When you convert your list to a dataframe, you are not supplying a column name, neither you rename the column later - so your dataframe has only one column 0 (numerical index). When you try to select test2['sum1'], it raises an error - there are only integers in range index (it probably raises KeyError somewhere too). You can access your column with test2[0] or with ix/iloc/loc, but better would be to rename it or provide column name when you create the dataframe (either columns parameter or pd.DataFrame({'sum1': test}) ).

And you are appending to test only in else clause, so your list will consist only of "False"s (misindented line?).
(Mar-22-2017, 10:15 PM)zivoni Wrote: [ -> ]Pandas has no problems to count values in boolean columns, but there are some problems with your code:

When you convert your list to a dataframe, you are not supplying a column name, neither you rename the column later - so your dataframe has only one column 0 (numerical index). When you try to select test2['sum1'], it raises an error - there are only integers in range index (it probably raises KeyError somewhere too). You can access your column with test2[0] or with ix/iloc/loc, but better would be to rename it or provide column name when you create the dataframe (either columns parameter or pd.DataFrame({'sum1': test}) ).

And you are appending to test only in else clause, so your list will consist only of "False"s (misindented line?).

Thank you so much for the help. I forgot I needed to supply the DataFrames argument with a columns parameter if I wanted to reference it as "sum1." Additionally, I did forget to add an append method in my if clause as well. Thanks for noticing that.
You do not need to add another append(), de-indenting .append() in else clausule will do it - it will be always executed, regardless whether sum1 was defined in if or else.

And as you are using your if/else only to set value of sum1 to True/False, simpler way is to directly assign result of your condition into sum1:
for num in sal['JobTitle']:
    sum1 = 'chief' in num.lower()
    test.append(sum1)
Or you can do it directly with test.append( 'chief' in num.lower() ) without using sum1 variable at all - but perhaps the code above is little easier to understand.