Python Forum

Full Version: How to search for specific string in Pandas dataframe
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, Smile
I'm trying to extract lines from my dataframe using Pandas in a specific column named Equipe_Junior. For now I have ben able to extract my data when asking for the complete string for example: Quebec Remparts [QMJHL]. But I would like to go trough my dataframe for all [QMJHL] or [OHL] or any junior league so I can work stats with that, whithout having to ask for a specific junior team, just the league.

This is my code and results. Thanks for your help.

import pandas as pd
data= pd.read_csv(r'C:\Users\ben\PycharmProjects\draft2020\hockey_draft2012_click_test.csv')
df = pd.DataFrame(data, columns=['Ronde','Equipe','Nom','Equipe_Junior','MJ'])  # choose column from csv
df = df.fillna(0)  # replace nan with 0
select = df.loc[df['Equipe_Junior'] =='Quebec Remparts [QMJHL]']  # select players from that team only
print(select)
Output:
Result Ronde Equipe Nom Equipe_Junior MJ 11 1 Buffalo Mikhail Grigorenko Quebec Remparts [QMJHL] 217.0 123 5 Calgary Ryan Culkin Quebec Remparts [QMJHL] 0.0 165 6 Ottawa Francois Brassard Quebec Remparts [QMJHL] 0.0
(Oct-22-2020, 07:19 PM)Coding_Jam Wrote: [ -> ]Hi, Smile
I'm trying to extract lines from my dataframe using Pandas in a specific column named Equipe_Junior. For now I have ben able to extract my data when asking for the complete string for example: Quebec Remparts [QMJHL]. But I would like to go trough my dataframe for all [QMJHL] or [OHL] or any junior league so I can work stats with that, whithout having to ask for a specific junior team, just the league.

This is my code and results. Thanks for your help.

import pandas as pd
data= pd.read_csv(r'C:\Users\ben\PycharmProjects\draft2020\hockey_draft2012_click_test.csv')
df = pd.DataFrame(data, columns=['Ronde','Equipe','Nom','Equipe_Junior','MJ']) # choose column from csv
df = df.fillna(0) # replace nan with 0
select = df.loc[df['Equipe_Junior'] =='Quebec Remparts [QMJHL]'] # select players from that team only
print(select)

Result
Ronde Equipe Nom Equipe_Junior MJ
11 1 Buffalo Mikhail Grigorenko Quebec Remparts [QMJHL] 217.0
123 5 Calgary Ryan Culkin Quebec Remparts [QMJHL] 0.0
165 6 Ottawa Francois Brassard Quebec Remparts [QMJHL] 0.0

Hey! Maybe you could use loc and str.contains? Something like this would select the rows containing either "QMJHL" or "OHL":

df.loc[df.loc[:, 'Equipe_Junior'].str.contains(r'(QMJHL|OHL)')]
In the code above, you would select the rwos containing either of the leagues because you create a boolean mask. Loc will select the rows in the dataframe based on this.

Hope it works!

Best,

E