pandas df inside a df question

mbaker_wv · Dec-24-2022, 02:52 AM

hello

here is my initial code

# Parse the input CSV file
df = pd.read_csv('employees.csv')

# Filter out employees who have not taken the training
df = df[df['Training'] == 'No']

im trying to understand

df[df['Training']==No]

I understand the first inner

df['Training']

This returns only the Training column data. When I add the == No to the back side of that, it turns that data output into a Boolean value. No's become True, while everything else becomes False.

Output:0    Yes
1     No
2     No
3     No
4    Yes
5     No
6     No
7    Yes
8     No
9     No
Name: Training, dtype: object

Output:0    False
1     True
2     True
3     True
4    False
5     True
6     True
7    False
8     True
9     True
Name: Training, dtype: bool

But if I add that back into another df[] like this:

df[df['Training']==No]

then the output joins the rest of the csv file and looks like this

Output:             Name              Department Training           Boss Email
1        John Doe         Human Resources       No  [email protected]
2     James Smith             Engineering       No  [email protected]
3   Jane Anderson             Engineering       No  [email protected]
5  Derrick Wheels  Information Technology       No   [email protected]
6   George Thomas         Human Resources       No  [email protected]
8   Brandon Combs  Information Technology       No   [email protected]
9    Jason Baxter              Management       No   [email protected]

I dont understand how this happens. How does putting all that inside another df[] filter the original csv files for training that equals No, and then put it all back inside the main csv file?

Does anyone have a better way of explaining it to me?

Thank you in advance,

mbaker_wv

**deanhystad** · Dec-24-2022, 05:00 AM

Read this:

https://pandas.pydata.org/docs/getting_s..._data.html

mbaker_wv · Dec-24-2022, 02:29 PM

@deanhystad

So do I understand this now?

df['Training']=='No' output returns a 'series' which is one dimensional and only shows a single column.

placing it back inside another df[] returns a dataframe which is 2-dimensional and shows both columns and rows.

mbaker_wv

**deanhystad** · Dec-24-2022, 04:15 PM

I have a DataFrame

import pandas as pd

df = pd.DataFrame(range(1, 7), columns=["numbers"])

Output:   numbers
0        1
1        2
2        3
3        4
4        5
5        6

I create a Series (kind of like a one column dataframe, kind of like a list or array). The Series contains True when the corresponding "numbers" is not evenly divisible by 2.

odd_series = df["numbers"] % 2 != 0
print(odd_series)

Output:0     True
1    False
2     True
3    False
4     True
5    False

I use this series to create a new dataframe, selecting only the rows from "df" that are True in "odd_series".

odd_df = df[odd_series]
print(odd_df)

Output:   numbers
0        1
2        3
4        5

Note that the original datafram "df" is unchanged. odd_series is also unchanged.

I can condense this:

import pandas as pd

df = pd.DataFrame(range(1, 7), columns=["numbers"])
print(df[df["numbers"] % 2 != 0])

mbaker_wv · Dec-25-2022, 01:11 AM

@deanhystad
thank you for the output. I think im on my way to understanding this more now.

mbaker_wv

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Pandas usecols question	rsearing	1	1,254	Aug-20-2022, 10:10 PM Last Post: jefsummers
	Simple pandas question	mcva	4	2,664	Dec-17-2021, 04:47 PM Last Post: mcva
	Pandas question	new2datasci	0	1,960	Jan-10-2021, 01:29 AM Last Post: new2datasci
	Pandas merge question	smw10c	2	5,747	Jul-02-2020, 06:56 PM Last Post: hussainmujtaba
	Counting Criteria in Pandas Question	Koenig	1	2,181	Sep-30-2019, 05:16 AM Last Post: perfringo
	Pandas .rolling() with some calculations inside	irmscher	5	6,234	Apr-04-2019, 11:55 AM Last Post: scidam
	Function question using Pandas	smw10c	7	7,140	Feb-12-2019, 06:52 PM Last Post: Nathandsn
	Simple pandas dataframe question	popohoma	1	3,556	Jan-03-2019, 05:00 PM Last Post: ashlardev
	question on pandas datareader	kit12_31	3	9,248	Feb-05-2018, 11:55 PM Last Post: snippsat
	Newbie question on how to use pandas.rolling_mean	zydjohn	5	14,285	Dec-09-2017, 08:42 PM Last Post: j.crater

pandas df inside a df question

User Panel Messages

Announcements