Python Forum

Full Version: pd.query method question
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I have a dataframe for which I want to create a filter.
I know you can do it via Boolean indexing, but now I found as well the pd.query method.

DF :
  col1  col2
0     Jan     3
1     Feb     4
2     Mar     5
df.query('col1 == "Feb"')
would normally work, but I wonder how you can filter on 2 columns as
df.query('col1 == "Feb" and col2 == "Mar"')
doesnt seem to work.

Thanks!
The expression should work as I understand it. However, the expression doesn't match the data in your dataframe. Col2 appears to have numbers, not strings. Can you confirm the data types in the columns?
Oh, sorry for confusion.

second code should be
df.query('col1 == "Feb" and col1 == "Mar"')
which gives me the following output without any data and only the column names.
col1	col2
In that case, you want to change the operator to "or":

df.query('col1 == "Feb" or col1 == "Mar"')
Any individual value in col1 cannot equal "Feb" and "Mar" simultaneously so "and" won't work.
Ok I see! thanks for the clarification :)

Maybe a question to add on that, in which cases it's better to use Boolean Indexing and in which one pd.query, or are the functionalities similar?

Cheers
Boolean indexing, as I understand it, is a means to select specific values from a numpy array. It's similar to filter with a couple differences.

It differs from pandas.query() in that query selects rows of data that match the criteria and Boolean indexing selects individual data points without concern to content.