Python Forum
for loop in dataframe in pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
for loop in dataframe in pandas
#1
Hello,

I have a problem with a "for loop" using a dataframe in pandas, hope somebody can help with that.

I have the following dataframe in a csv file:

,forename,surname,gender,age,100m,200m,400m,800m,1500m
0,Migdalia,Parrish,F,18,11.08,29.0,59.41,122.05,259.11
1,Valerie,Lee,F,10,17.23,46.0,100.02,232.64,480.95
2,John,Debnam,M,17,10.81,25.89,50.6,110.29,232.39
3,Roy,Miller,M,10,19.18,46.74,95.32,201.14,430.27
4,Aida,Aumiller,F,11,15.3,41.83,81.06,189.03,394.9
5,Marcia,Brown,F,19,11.13,24.62,57.59,119.13,256.37
6,Harry,Knows,M,16,12.39,25.94,49.67,106.56,237.14
7,Barry,Lennon,M,14,11.15,23.56,46.46,110.89,230.49
8,Lilia,Armstrong,F,13,8.84,25.09,59.54,128.95,258.47
9,Johnny,Casey,M,15,9.65,22.67,49.46,112.85,233.87
10,Donald,Taylor,M,15,11.74,22.42,49.22,114.62,224.63
11,Martha,Woods,F,14,9.01,24.34,55.25,118.8,254.87
12,Diane,Lauria,F,15,8.99,27.92,54.79,119.89,249.21
13,Yvonne,Pumphrey,F,16,8.84,27.29,57.63,123.13,247.41
14,Betty,Stephenson,F,14,11.04,28.73,59.05,126.29,256.44
15,Lilia,Armstrong,F,12,11.31,34.43,74.28,150.05,321.07

And I have to create a main function that calls another function that, using a "for loop", retrieves the fastest time for each age (10,11,12,13,14,15,16) for a specific gender (e.g. 'F') and distance (e.g. '100m').

For example:
Input:
fastest_athletes = find_fastest_athletes(df,"100m","F",[10,11,12,13,14,15,16])
Output:
{
10: {’forename’: 'Valerie’, 'surname’: 'Lee’, 'time’: '17.23’},
11: {’forename’: 'Aida’, 'surname’: 'Aumiller’, 'time’: '15.3’},
12: {’forename’: 'Lilia’, 'surname’: 'Armstrong’, 'time’: '11.31’},
13: {’forename’: 'Lilia’, 'surname’: 'Armstrong’, 'time’: '8.84’},
14: {’forename’: 'Martha’, 'surname’: 'Woods’, 'time’: '9.01’},
15: {’forename’: 'Diane’, 'surname’: 'Lauria’, 'time’: '8.99’},
16: {’forename’: 'Yvonne’, 'surname’: 'Pumphrey’, 'time’: '8.84’}
}

I did the following code:

# Function with the for loop
def find_fastest_athletes(df,distance,gender,ages):
  for age in range(10,16):
    fastest_athletes = df[(df["gender"] == gender) & (df["age"] == age)]
    fastest_athletes_sorted = fastest_athletes.sort_values(distance,ascending=True)
    fastest_athletes_value = fastest_athletes_sorted.iloc[[0]][["forename","surname","100m"]]
    athletes_data = fastest_athletes_value.to_string(index=False, header=False).split('  ')
    athletes_data_dict = {
        'forename': athletes_data[0].strip(),
        'surname': athletes_data[1],
        'time': float(athletes_data[2])
        }
  return athletes_data_dict
  
# Main function
def main(filename='athletes.csv'):
    df = pd.read_csv(filename, index_col=0)
    df['100m'] = df['100m'].astype(float)
    print(find_fastest_athletes(df,'100m','F',[10,11,12,13,14,15,16]))
    return
   
if __name__ == "__main__":
  main()  
With my coding I get as output ONLY the fastest athlete for the last age (16 year's old) and not ALL the fastest athletes for each age (10,11,12,13,14,15,16), why is that?

Also how can I add the age at the beginning of each line?
Reply
#2
It seems that in your "athletes_data_dict" you are using fixed key names (forename,surname,time),
over and over again. Keys should be unique in a dictionary.

Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#3
(Dec-01-2021, 04:21 PM)DPaul Wrote: It seems that in your "athletes_data_dict" you are using fixed key names (forename,surname,time),
over and over again. Keys should be unique in a dictionary.

Paul

If I take out the "for age in range(10,16)" and run the code for only one age (e.g. 16) it works perfectly.
The problem is when I want the fastest athlete for each age, I get only the last of the loop.

I think that that can be related on how I wrote the for loop, but tried different way and still getting only one output instead of 7
Reply
#4
Each time through the loop you create (and overwrite if already created) athletes_data_dict. You're not storing the one for each age anywhere. You need a collection like a list and append the dict for each age to it.

Then return the collection instead of athletes_data_dict which has only the last age in it.
Reply
#5
(Dec-01-2021, 05:28 PM)bowlofred Wrote: Each time through the loop you create (and overwrite if already created) athletes_data_dict. You're not storing the one for each age anywhere. You need a collection like a list and append the dict for each age to it.

Then return the collection instead of athletes_data_dict which has only the last age in it.

Thanks for that, now I'm starting to understand how it works. I then added in the loop of the function an empty list called collection[] to add every time the output from the loop for each age as shown below:

def find_fastest_athletes(df,distance,gender,age):
  for age in range(11,17,1):
    fastest_athletes = df[(df["gender"] == gender) & (df["age"] == age)]
    fastest_athletes_sorted = fastest_athletes.sort_values(distance,ascending=True)
    fastest_athletes_value = fastest_athletes_sorted.iloc[[0]][["forename","surname","100m"]]
    athletes_data = fastest_athletes_value.to_string(index=False, header=False).split('  ')
    athletes_data_dict = {
        'forename': athletes_data[0].strip(),
        'surname': athletes_data[1],
        'time': float(athletes_data[2])
        }
    collection=[]
    collection.append(athletes_data_dict)
  return collection


But now I don't understand why I'm still getting only the last fastest athlete (16 years' old), shouldn't now add every time to the collection list the new athlete from the loop?
Reply
#6
You're creating a (new, empty) collection inside the loop. So each time through you throw away the old one.

Create the collection outside the loop.
append to it inside the loop.
Return the collection after the loop.
Paulman likes this post
Reply
#7
(Dec-01-2021, 07:50 PM)bowlofred Wrote: You're creating a (new, empty) collection inside the loop. So each time through you throw away the old one.

Create the collection outside the loop.
append to it inside the loop.
Return the collection after the loop.

Many thanks for that, I think that now I have the full coding working, please see below:

def find_fastest_athletes(df,distance,gender,ages):
  data=[]
  for age in ages:
    fastest_athletes = df.loc[(df.gender == gender) & (df.age == age)]
    fastest_athletes_sorted = fastest_athletes.sort_values(distance,ascending=True)
    fastest_athletes_value = fastest_athletes_sorted.iloc[[0]][["forename","surname","100m"]]
    athletes_data = fastest_athletes_value.to_string(index=False, header=False).split('  ')
    athletes_data_dict ={
        'forename': athletes_data[0].strip(),
        'surname': athletes_data[1],
        'time': float(athletes_data[2])
    }
    athletes_data_dict_num = (age,athletes_data_dict)
    data.append(athletes_data_dict_num)
  return data

def main(filename='athletes.csv'):
    df = pd.read_csv(filename, index_col=0)
    df['100m'] = df['100m'].astype(float)
    print(find_fastest_athletes(df,'100m','F',[10,11,12,13,14,15,16]))
    return
Btw: just a small thing: now I have all the 7 outputs in one single line, how can I get them in 7 separate lines when I append the data? I looked all over Internet to find the right command to go to the next line when using "append" but no success.
Reply
#8
Not sure what you mean by a line. You have a data structure. Looks like data is a list. It contains tuples of (age, athlete_data), and athlete_data is a dict of forename,surname,time.

You can print any part of it however you want.

athletes = find_fastest_athletes(df,'100m','F',[10,11,12,13,14,15,16])
for age, athlete_info in athletes:
    print(f"Age:{age} - Info:{athlete_info}")
Paulman likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition sanky1990 0 728 Dec-04-2023, 09:48 PM
Last Post: sanky1990
  Question on pandas.dataframe merging two colums shomikc 4 829 Jun-29-2023, 11:30 AM
Last Post: snippsat
  Pandas AttributeError: 'DataFrame' object has no attribute 'concat' Sameer33 5 5,593 Feb-17-2023, 06:01 PM
Last Post: Sameer33
  help how to get size of pandas dataframe into MB\GB mg24 1 2,353 Jan-28-2023, 01:23 PM
Last Post: snippsat
  pandas dataframe into csv .... exponent issue mg24 10 1,766 Jan-20-2023, 08:15 PM
Last Post: deanhystad
  How to assign a value to pandas dataframe column rows based on a condition klllmmm 0 828 Sep-08-2022, 06:32 AM
Last Post: klllmmm
  How to retrieve records in a DataFrame (Python/Pandas) that contains leading or trail mmunozjr 3 1,747 Sep-05-2022, 11:56 AM
Last Post: Pedroski55
  export into excel, how to implement pandas into for-loop deneme2 6 2,439 Sep-01-2022, 05:44 AM
Last Post: deneme2
  "Vlookup" in pandas dataframe doug2019 3 1,855 May-09-2022, 01:35 PM
Last Post: snippsat
  Increase the speed of a python loop over a pandas dataframe mcva 0 1,314 Jan-21-2022, 06:24 PM
Last Post: mcva

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020