Bottom Page

Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 value_counts method question
#1
I hope you are all having a good day. I am currently teaching myself Python through an online MOOC. I have a question about the following line of code:

test=list()
for num in sal['JobTitle']:
    if 'chief' in num.lower():
        sum1=True
    else:
        sum1=False
        test.append(sum1)
test2=pd.DataFrame(test)
test2['sum1'].value_counts()
I am trying to see the number of True and False responses through the value_counts method. However, I receive an error message stating "an integer is required." I am not looking for an entirely new way of obtaining this answer, rather, I am trying to figure out how to go about this method and obtain the solution. I really want to learn all the intricacies of Python.

Moderator: Please use code tags in the future, so code is easier to read.
- nilamo
Quote
#2
Instead of True/False, does it work with 1/0? That might just be pandas refusing to add boolean values together implicitly.
Quote
#3
Pandas has no problems to count values in boolean columns, but there are some problems with your code:

When you convert your list to a dataframe, you are not supplying a column name, neither you rename the column later - so your dataframe has only one column 0 (numerical index). When you try to select test2['sum1'], it raises an error - there are only integers in range index (it probably raises KeyError somewhere too). You can access your column with test2[0] or with ix/iloc/loc, but better would be to rename it or provide column name when you create the dataframe (either columns parameter or pd.DataFrame({'sum1': test}) ).

And you are appending to test only in else clause, so your list will consist only of "False"s (misindented line?).
Quote
#4
(Mar-22-2017, 10:15 PM)zivoni Wrote: Pandas has no problems to count values in boolean columns, but there are some problems with your code:

When you convert your list to a dataframe, you are not supplying a column name, neither you rename the column later - so your dataframe has only one column 0 (numerical index). When you try to select test2['sum1'], it raises an error - there are only integers in range index (it probably raises KeyError somewhere too). You can access your column with test2[0] or with ix/iloc/loc, but better would be to rename it or provide column name when you create the dataframe (either columns parameter or pd.DataFrame({'sum1': test}) ).

And you are appending to test only in else clause, so your list will consist only of "False"s (misindented line?).

Thank you so much for the help. I forgot I needed to supply the DataFrames argument with a columns parameter if I wanted to reference it as "sum1." Additionally, I did forget to add an append method in my if clause as well. Thanks for noticing that.
Quote
#5
You do not need to add another append(), de-indenting .append() in else clausule will do it - it will be always executed, regardless whether sum1 was defined in if or else.

And as you are using your if/else only to set value of sum1 to True/False, simpler way is to directly assign result of your condition into sum1:
for num in sal['JobTitle']:
    sum1 = 'chief' in num.lower()
    test.append(sum1)
Or you can do it directly with test.append( 'chief' in num.lower() ) without using sum1 variable at all - but perhaps the code above is little easier to understand.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  pd.query method question PolskaYBZ 5 519 Jan-25-2019, 08:23 PM
Last Post: stullis
  PyCharm IDE: Method Not Showing Up Question: Bug or Feature? Oliver 2 1,053 Dec-04-2017, 11:54 AM
Last Post: Oliver
  Apply Method Question smw10c 4 1,504 Apr-08-2017, 12:47 PM
Last Post: smw10c

Forum Jump:


Users browsing this thread: 1 Guest(s)