Python Forum

Full Version: pandas df inside a df question
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
hello

here is my initial code

# Parse the input CSV file
df = pd.read_csv('employees.csv')

# Filter out employees who have not taken the training
df = df[df['Training'] == 'No']
im trying to understand
df[df['Training']==No]
I understand the first inner
df['Training']
This returns only the Training column data. When I add the == No to the back side of that, it turns that data output into a Boolean value. No's become True, while everything else becomes False.

Output:
0 Yes 1 No 2 No 3 No 4 Yes 5 No 6 No 7 Yes 8 No 9 No Name: Training, dtype: object
Output:
0 False 1 True 2 True 3 True 4 False 5 True 6 True 7 False 8 True 9 True Name: Training, dtype: bool
But if I add that back into another df[] like this:
df[df['Training']==No]
then the output joins the rest of the csv file and looks like this

Output:
Name Department Training Boss Email 1 John Doe Human Resources No [email protected] 2 James Smith Engineering No [email protected] 3 Jane Anderson Engineering No [email protected] 5 Derrick Wheels Information Technology No [email protected] 6 George Thomas Human Resources No [email protected] 8 Brandon Combs Information Technology No [email protected] 9 Jason Baxter Management No [email protected]
I dont understand how this happens. How does putting all that inside another df[] filter the original csv files for training that equals No, and then put it all back inside the main csv file?

Does anyone have a better way of explaining it to me?

Thank you in advance,

mbaker_wv
@deanhystad

So do I understand this now?

df['Training']=='No' output returns a 'series' which is one dimensional and only shows a single column.

placing it back inside another df[] returns a dataframe which is 2-dimensional and shows both columns and rows.

mbaker_wv
I have a DataFrame
import pandas as pd

df = pd.DataFrame(range(1, 7), columns=["numbers"])
Output:
numbers 0 1 1 2 2 3 3 4 4 5 5 6
I create a Series (kind of like a one column dataframe, kind of like a list or array). The Series contains True when the corresponding "numbers" is not evenly divisible by 2.
odd_series = df["numbers"] % 2 != 0
print(odd_series)
Output:
0 True 1 False 2 True 3 False 4 True 5 False
I use this series to create a new dataframe, selecting only the rows from "df" that are True in "odd_series".
odd_df = df[odd_series]
print(odd_df)
Output:
numbers 0 1 2 3 4 5
Note that the original datafram "df" is unchanged. odd_series is also unchanged.

I can condense this:
import pandas as pd

df = pd.DataFrame(range(1, 7), columns=["numbers"])
print(df[df["numbers"] % 2 != 0])
@deanhystad
thank you for the output. I think im on my way to understanding this more now.

mbaker_wv