boolean operator question

smw10c · Apr-17-2017, 09:00 PM

I hope you are all having a good day. Why does the following code work (where dictframe is a DataFrame)

dictframe[(dictframe['population']>1) & (dictframe['population']<3)]

but the following does not:

dictframe[dictframe['population']>1 & dictframe['population']<3]

volcano63 · Apr-17-2017, 09:12 PM

I would guess - operator precedence, but also

& in Python is bitwise AND - use and for boolean expressions
in Python, you may chain comparison operators

1 < x < 3

***zivoni*** · Apr-17-2017, 09:24 PM

As volcano63 pointed out, you are getting error due to operator precedence. & has higher precedence than >, so without parenthesis you are trying bitwise and between 1 and dataframe column, thats not supported.

Chaining operators does not work with pandas series (or numpy arrays). You need to use parenthesis or numpy logical functions - like:

dictframe[ np.logical_and(dictframe.population > 1, dictframe.population < 3) ]

smw10c · Apr-17-2017, 11:15 PM

Zivoni. Thank you for the comment. What do you mean by bitwise? I am still very confused on how "&" taking precedence over ">" changes the outcome. Thank you again for the help.

***metulburr*** · Apr-17-2017, 11:26 PM

(Apr-17-2017, 11:15 PM)smw10c Wrote: What do you mean by bitwise? I am still very confused on how "&" taking precedence over ">" changes the outcome.

bitwise and precedence
changes the value by the same manner in which 2 + 3 × 4 results to 14, not 20 due to order of operations.

smw10c · Apr-17-2017, 11:32 PM

(Apr-17-2017, 11:26 PM)metulburr Wrote:
(Apr-17-2017, 11:15 PM)smw10c Wrote: What do you mean by bitwise? I am still very confused on how "&" taking precedence over ">" changes the outcome.
bitwise and precedence changes the value by the same manner in which 2 + 3 × 4 results to 14, not 20 due to order of operations.

Even if it changes the value, why wouldn't it just run and give me an incorrect result?

***snippsat*** · (This post was last modified: Apr-18-2017, 01:16 AM by snippsat.)

It work in a different way in Pandas and Numpy as mention @zivoni.
Pandas objects such as Series and NumPy arrays dos not have a boolean values.
They raise ValueError(refuse to guess True or False).
So use normal Python and or not will not work.
Python:

>>> lst_1 = [1, 2, 3]
>>> bool(lst_1)
True
>>> lst_1 = [2, 8]
>>> lst_2 = [2, 8]
>>> lst_1 and lst_2
[2, 8]

Pandas:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.random.randn(3,3))
>>> df
          0         1         2
0  0.518276  0.511278 -1.200522
1  0.301082  0.166139  0.173871
2 -0.968949  0.840400 -0.161232

>>> df[(df > .2) & (df < 1)]
          0         1   2
0  0.518276  0.511278 NaN
1  0.301082       NaN NaN
2       NaN  0.840400 NaN

>>> # Now look a bool value
>>> bool(df[(df > 1)])
Traceback (most recent call last):
  File "<string>", line 301, in runcode
  File "<interactive input>", line 1, in <module>
  File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 917, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> # Same error for <and or not>

Boolean indexing

Quote:Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses.

boolean operator question

User Panel Messages

Announcements