Nov-02-2020, 09:35 AM
(Oct-22-2020, 07:19 PM)Coding_Jam Wrote: Hi,
I'm trying to extract lines from my dataframe using Pandas in a specific column named Equipe_Junior. For now I have ben able to extract my data when asking for the complete string for example: Quebec Remparts [QMJHL]. But I would like to go trough my dataframe for all [QMJHL] or [OHL] or any junior league so I can work stats with that, whithout having to ask for a specific junior team, just the league.
This is my code and results. Thanks for your help.
import pandas as pd
data= pd.read_csv(r'C:\Users\ben\PycharmProjects\draft2020\hockey_draft2012_click_test.csv')
df = pd.DataFrame(data, columns=['Ronde','Equipe','Nom','Equipe_Junior','MJ']) # choose column from csv
df = df.fillna(0) # replace nan with 0
select = df.loc[df['Equipe_Junior'] =='Quebec Remparts [QMJHL]'] # select players from that team only
print(select)
Result
Ronde Equipe Nom Equipe_Junior MJ
11 1 Buffalo Mikhail Grigorenko Quebec Remparts [QMJHL] 217.0
123 5 Calgary Ryan Culkin Quebec Remparts [QMJHL] 0.0
165 6 Ottawa Francois Brassard Quebec Remparts [QMJHL] 0.0
Hey! Maybe you could use loc and str.contains? Something like this would select the rows containing either "QMJHL" or "OHL":
df.loc[df.loc[:, 'Equipe_Junior'].str.contains(r'(QMJHL|OHL)')]In the code above, you would select the rwos containing either of the leagues because you create a boolean mask. Loc will select the rows in the dataframe based on this.
Hope it works!
Best,
E