for loop in dataframe in pandas

Paulman · Dec-01-2021, 02:35 PM

Hello,

I have a problem with a "for loop" using a dataframe in pandas, hope somebody can help with that.

I have the following dataframe in a csv file:

,forename,surname,gender,age,100m,200m,400m,800m,1500m
0,Migdalia,Parrish,F,18,11.08,29.0,59.41,122.05,259.11
1,Valerie,Lee,F,10,17.23,46.0,100.02,232.64,480.95
2,John,Debnam,M,17,10.81,25.89,50.6,110.29,232.39
3,Roy,Miller,M,10,19.18,46.74,95.32,201.14,430.27
4,Aida,Aumiller,F,11,15.3,41.83,81.06,189.03,394.9
5,Marcia,Brown,F,19,11.13,24.62,57.59,119.13,256.37
6,Harry,Knows,M,16,12.39,25.94,49.67,106.56,237.14
7,Barry,Lennon,M,14,11.15,23.56,46.46,110.89,230.49
8,Lilia,Armstrong,F,13,8.84,25.09,59.54,128.95,258.47
9,Johnny,Casey,M,15,9.65,22.67,49.46,112.85,233.87
10,Donald,Taylor,M,15,11.74,22.42,49.22,114.62,224.63
11,Martha,Woods,F,14,9.01,24.34,55.25,118.8,254.87
12,Diane,Lauria,F,15,8.99,27.92,54.79,119.89,249.21
13,Yvonne,Pumphrey,F,16,8.84,27.29,57.63,123.13,247.41
14,Betty,Stephenson,F,14,11.04,28.73,59.05,126.29,256.44
15,Lilia,Armstrong,F,12,11.31,34.43,74.28,150.05,321.07

And I have to create a main function that calls another function that, using a "for loop", retrieves the fastest time for each age (10,11,12,13,14,15,16) for a specific gender (e.g. 'F') and distance (e.g. '100m').

For example:
Input:
fastest_athletes = find_fastest_athletes(df,"100m","F",[10,11,12,13,14,15,16])
Output:
{
10: {’forename’: 'Valerie’, 'surname’: 'Lee’, 'time’: '17.23’},
11: {’forename’: 'Aida’, 'surname’: 'Aumiller’, 'time’: '15.3’},
12: {’forename’: 'Lilia’, 'surname’: 'Armstrong’, 'time’: '11.31’},
13: {’forename’: 'Lilia’, 'surname’: 'Armstrong’, 'time’: '8.84’},
14: {’forename’: 'Martha’, 'surname’: 'Woods’, 'time’: '9.01’},
15: {’forename’: 'Diane’, 'surname’: 'Lauria’, 'time’: '8.99’},
16: {’forename’: 'Yvonne’, 'surname’: 'Pumphrey’, 'time’: '8.84’}
}

I did the following code:

# Function with the for loop
def find_fastest_athletes(df,distance,gender,ages):
  for age in range(10,16):
    fastest_athletes = df[(df["gender"] == gender) & (df["age"] == age)]
    fastest_athletes_sorted = fastest_athletes.sort_values(distance,ascending=True)
    fastest_athletes_value = fastest_athletes_sorted.iloc[[0]][["forename","surname","100m"]]
    athletes_data = fastest_athletes_value.to_string(index=False, header=False).split('  ')
    athletes_data_dict = {
        'forename': athletes_data[0].strip(),
        'surname': athletes_data[1],
        'time': float(athletes_data[2])
        }
  return athletes_data_dict
  
# Main function
def main(filename='athletes.csv'):
    df = pd.read_csv(filename, index_col=0)
    df['100m'] = df['100m'].astype(float)
    print(find_fastest_athletes(df,'100m','F',[10,11,12,13,14,15,16]))
    return
   
if __name__ == "__main__":
  main()

With my coding I get as output ONLY the fastest athlete for the last age (16 year's old) and not ALL the fastest athletes for each age (10,11,12,13,14,15,16), why is that?

Also how can I add the age at the beginning of each line?

DPaul · (This post was last modified: Dec-01-2021, 04:21 PM by DPaul.)

It seems that in your "athletes_data_dict" you are using fixed key names (forename,surname,time),
over and over again. Keys should be unique in a dictionary.

Paul

Paulman · Dec-01-2021, 04:39 PM

(Dec-01-2021, 04:21 PM)DPaul Wrote: It seems that in your "athletes_data_dict" you are using fixed key names (forename,surname,time),
over and over again. Keys should be unique in a dictionary.

Paul

If I take out the "for age in range(10,16)" and run the code for only one age (e.g. 16) it works perfectly.
The problem is when I want the fastest athlete for each age, I get only the last of the loop.

I think that that can be related on how I wrote the for loop, but tried different way and still getting only one output instead of 7

bowlofred · Dec-01-2021, 05:28 PM

Each time through the loop you create (and overwrite if already created) athletes_data_dict. You're not storing the one for each age anywhere. You need a collection like a list and append the dict for each age to it.

Then return the collection instead of athletes_data_dict which has only the last age in it.

Paulman · Dec-01-2021, 06:25 PM

(Dec-01-2021, 05:28 PM)bowlofred Wrote: Each time through the loop you create (and overwrite if already created) athletes_data_dict. You're not storing the one for each age anywhere. You need a collection like a list and append the dict for each age to it.

Then return the collection instead of athletes_data_dict which has only the last age in it.

Thanks for that, now I'm starting to understand how it works. I then added in the loop of the function an empty list called collection[] to add every time the output from the loop for each age as shown below:

def find_fastest_athletes(df,distance,gender,age):
  for age in range(11,17,1):
    fastest_athletes = df[(df["gender"] == gender) & (df["age"] == age)]
    fastest_athletes_sorted = fastest_athletes.sort_values(distance,ascending=True)
    fastest_athletes_value = fastest_athletes_sorted.iloc[[0]][["forename","surname","100m"]]
    athletes_data = fastest_athletes_value.to_string(index=False, header=False).split('  ')
    athletes_data_dict = {
        'forename': athletes_data[0].strip(),
        'surname': athletes_data[1],
        'time': float(athletes_data[2])
        }
    collection=[]
    collection.append(athletes_data_dict)
  return collection

But now I don't understand why I'm still getting only the last fastest athlete (16 years' old), shouldn't now add every time to the collection list the new athlete from the loop?

bowlofred · Dec-01-2021, 07:50 PM

You're creating a (new, empty) collection inside the loop. So each time through you throw away the old one.

Create the collection outside the loop.
append to it inside the loop.
Return the collection after the loop.

Paulman · Dec-01-2021, 10:55 PM

(Dec-01-2021, 07:50 PM)bowlofred Wrote: You're creating a (new, empty) collection inside the loop. So each time through you throw away the old one.

Create the collection outside the loop.
append to it inside the loop.
Return the collection after the loop.

Many thanks for that, I think that now I have the full coding working, please see below:

def find_fastest_athletes(df,distance,gender,ages):
  data=[]
  for age in ages:
    fastest_athletes = df.loc[(df.gender == gender) & (df.age == age)]
    fastest_athletes_sorted = fastest_athletes.sort_values(distance,ascending=True)
    fastest_athletes_value = fastest_athletes_sorted.iloc[[0]][["forename","surname","100m"]]
    athletes_data = fastest_athletes_value.to_string(index=False, header=False).split('  ')
    athletes_data_dict ={
        'forename': athletes_data[0].strip(),
        'surname': athletes_data[1],
        'time': float(athletes_data[2])
    }
    athletes_data_dict_num = (age,athletes_data_dict)
    data.append(athletes_data_dict_num)
  return data

def main(filename='athletes.csv'):
    df = pd.read_csv(filename, index_col=0)
    df['100m'] = df['100m'].astype(float)
    print(find_fastest_athletes(df,'100m','F',[10,11,12,13,14,15,16]))
    return

Btw: just a small thing: now I have all the 7 outputs in one single line, how can I get them in 7 separate lines when I append the data? I looked all over Internet to find the right command to go to the next line when using "append" but no success.

bowlofred · Dec-02-2021, 12:15 AM

Not sure what you mean by a line. You have a data structure. Looks like data is a list. It contains tuples of (age, athlete_data), and athlete_data is a dict of forename,surname,time.

You can print any part of it however you want.

athletes = find_fastest_athletes(df,'100m','F',[10,11,12,13,14,15,16])
for age, athlete_info in athletes:
    print(f"Age:{age} - Info:{athlete_info}")

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Converting Pandas DataFrame to a table of hourly blocks	Abedin	1	573	Apr-24-2025, 01:05 PM Last Post: snippsat
	Most efficient way to roll through a pandas dataframe?	sawtooth500	2	1,179	Aug-28-2024, 10:08 AM Last Post: Alice12
	docx file to pandas dataframe/excel	iitip92	1	2,740	Jun-27-2024, 05:28 AM Last Post: Pedroski55
	Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition	sanky1990	0	1,404	Dec-04-2023, 09:48 PM Last Post: sanky1990
	Question on pandas.dataframe merging two colums	shomikc	4	1,996	Jun-29-2023, 11:30 AM Last Post: snippsat
	Pandas AttributeError: 'DataFrame' object has no attribute 'concat'	Sameer33	5	10,466	Feb-17-2023, 06:01 PM Last Post: Sameer33
	help how to get size of pandas dataframe into MB\GB	mg24	1	5,463	Jan-28-2023, 01:23 PM Last Post: snippsat
	pandas dataframe into csv .... exponent issue	mg24	10	5,102	Jan-20-2023, 08:15 PM Last Post: deanhystad
	How to assign a value to pandas dataframe column rows based on a condition	klllmmm	0	1,947	Sep-08-2022, 06:32 AM Last Post: klllmmm
	How to retrieve records in a DataFrame (Python/Pandas) that contains leading or trail	mmunozjr	3	3,645	Sep-05-2022, 11:56 AM Last Post: Pedroski55

for loop in dataframe in pandas

User Panel Messages

Announcements