Python Forum
pandas df inside a df question
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pandas df inside a df question
#1
hello

here is my initial code

# Parse the input CSV file
df = pd.read_csv('employees.csv')

# Filter out employees who have not taken the training
df = df[df['Training'] == 'No']
im trying to understand
df[df['Training']==No]
I understand the first inner
df['Training']
This returns only the Training column data. When I add the == No to the back side of that, it turns that data output into a Boolean value. No's become True, while everything else becomes False.

Output:
0 Yes 1 No 2 No 3 No 4 Yes 5 No 6 No 7 Yes 8 No 9 No Name: Training, dtype: object
Output:
0 False 1 True 2 True 3 True 4 False 5 True 6 True 7 False 8 True 9 True Name: Training, dtype: bool
But if I add that back into another df[] like this:
df[df['Training']==No]
then the output joins the rest of the csv file and looks like this

Output:
Name Department Training Boss Email 1 John Doe Human Resources No [email protected] 2 James Smith Engineering No [email protected] 3 Jane Anderson Engineering No [email protected] 5 Derrick Wheels Information Technology No [email protected] 6 George Thomas Human Resources No [email protected] 8 Brandon Combs Information Technology No [email protected] 9 Jason Baxter Management No [email protected]
I dont understand how this happens. How does putting all that inside another df[] filter the original csv files for training that equals No, and then put it all back inside the main csv file?

Does anyone have a better way of explaining it to me?

Thank you in advance,

mbaker_wv
Reply
#2
Read this:

https://pandas.pydata.org/docs/getting_s..._data.html
Reply
#3
@deanhystad

So do I understand this now?

df['Training']=='No' output returns a 'series' which is one dimensional and only shows a single column.

placing it back inside another df[] returns a dataframe which is 2-dimensional and shows both columns and rows.

mbaker_wv
Reply
#4
I have a DataFrame
import pandas as pd

df = pd.DataFrame(range(1, 7), columns=["numbers"])
Output:
numbers 0 1 1 2 2 3 3 4 4 5 5 6
I create a Series (kind of like a one column dataframe, kind of like a list or array). The Series contains True when the corresponding "numbers" is not evenly divisible by 2.
odd_series = df["numbers"] % 2 != 0
print(odd_series)
Output:
0 True 1 False 2 True 3 False 4 True 5 False
I use this series to create a new dataframe, selecting only the rows from "df" that are True in "odd_series".
odd_df = df[odd_series]
print(odd_df)
Output:
numbers 0 1 2 3 4 5
Note that the original datafram "df" is unchanged. odd_series is also unchanged.

I can condense this:
import pandas as pd

df = pd.DataFrame(range(1, 7), columns=["numbers"])
print(df[df["numbers"] % 2 != 0])
Reply
#5
@deanhystad
thank you for the output. I think im on my way to understanding this more now.

mbaker_wv
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pandas usecols question rsearing 1 1,254 Aug-20-2022, 10:10 PM
Last Post: jefsummers
  Simple pandas question mcva 4 2,664 Dec-17-2021, 04:47 PM
Last Post: mcva
  Pandas question new2datasci 0 1,960 Jan-10-2021, 01:29 AM
Last Post: new2datasci
  Pandas merge question smw10c 2 5,747 Jul-02-2020, 06:56 PM
Last Post: hussainmujtaba
  Counting Criteria in Pandas Question Koenig 1 2,181 Sep-30-2019, 05:16 AM
Last Post: perfringo
  Pandas .rolling() with some calculations inside irmscher 5 6,234 Apr-04-2019, 11:55 AM
Last Post: scidam
  Function question using Pandas smw10c 7 7,140 Feb-12-2019, 06:52 PM
Last Post: Nathandsn
  Simple pandas dataframe question popohoma 1 3,556 Jan-03-2019, 05:00 PM
Last Post: ashlardev
  question on pandas datareader kit12_31 3 9,248 Feb-05-2018, 11:55 PM
Last Post: snippsat
  Newbie question on how to use pandas.rolling_mean zydjohn 5 14,285 Dec-09-2017, 08:42 PM
Last Post: j.crater

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020