Posts: 16
Threads: 2
Joined: Mar 2022
Dec-30-2022, 04:50 PM
(This post was last modified: Dec-30-2022, 09:45 PM by KoinKoin.)
Hello everyone,
I modified my code, it should be faster than the last one above, but still pretty long when I need to find 5 of 60...
Instead of comparing every combination like before, I putting in a dict index/score, then I sort by score and looking at the first indexes where sum-average) - sum(MVP) is under 121:
1 2 3 4 5 6 7 8 9 10 |
df1 = pd.DataFrame.from_dict(data)
for index in list (combinations(df1.index, 5 )):
dictIndexScore[index] = df1.loc[index,:][ "Score1" ]. sum (axis = 0 )
for i in sorted (dictIndexScore.items(), key = lambda x:x[ 1 ], reverse = True ):
if df1.loc[i[ 0 ],:][ "10Aver1" ]. sum (axis = 0 ) - df1.loc[i[ 0 ],:][ "10Aver1" ]. max (axis = 0 ) < 121 :
bestIndex = i[ 0 ]
bestSum = i[ 1 ]
break
print (df1.loc[bestIndex,:])
|
My program did 3366.6575298309326 seconds for a 60 rows dataframe, so 5461512 combinations possible of 5 in 60...
Do you have any tips/hints to go faster ?
I read numpy is faster than dataframe, but didn't really work yet with it
Posts: 6,824
Threads: 20
Joined: Feb 2020
Dec-31-2022, 04:47 AM
(This post was last modified: Dec-31-2022, 04:47 AM by deanhystad.)
I would not do this in numpy. For what you are doing, numpy just slows things down.
I wrote a python program that made 60 random players and picked the best team using combinations of 5 players. It takes 14 seconds to run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
from itertools import combinations
from random import randint
from time import time
class Player:
def __init__( self , name):
self .name = name
self .avg = randint( 1 , 100 )
self .score = randint( 1 , 200 )
def __repr__( self ):
return f '{self.name} score={self.score}, avg={self.avg}'
class Team:
def __init__( self , players):
self .players = sorted (players, key = lambda x: x.score, reverse = True )
self .score = sum (p.score for p in players)
self .cap = sum (p.avg for p in players) - self .players[ 0 ].avg
def __repr__( self ):
players = '\n' .join( str (p) for p in self .players)
return f '{players}\nscore={self.score}, cap={self.cap}\n'
start = time()
players = [Player(i) for i in range ( 1 , 61 )]
teams = [Team(pics) for pics in combinations(players, 5 )]
teams = [team for team in teams if team.cap < = 120 ]
best_team = max (teams, key = lambda p: p.score)
print (time() - start)
print (best_team)
|
A big time reduction was realized when I excluded players scoring below the league average. Now my runtime is under a second.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
from itertools import combinations
from random import randint
from time import time
class Player:
def __init__( self , name):
self .name = name
self .avg = randint( 1 , 100 )
self .score = randint( 1 , 200 )
def __repr__( self ):
return f '{self.name} score={self.score}, avg={self.avg}'
class Team:
def __init__( self , players):
self .players = sorted (players, key = lambda x: x.score, reverse = True )
self .score = sum (p.score for p in players)
self .cap = sum (p.avg for p in players) - self .players[ 0 ].avg
def __repr__( self ):
players = '\n' .join( str (p) for p in self .players)
return f '{players}\nscore={self.score}, cap={self.cap}\n'
start = time()
players = [Player(i) for i in range ( 1 , 61 )]
avg_score = sum (p.score for p in players) / len (players)
players = [player for player in players if player.score > avg_score]
print ( len (players))
teams = [team for team in teams if team.cap < = 120 ]
best_team = max (teams, key = lambda p: p.score)
print (time() - start)
print (best_team)
|
Posts: 16
Threads: 2
Joined: Mar 2022
Dec-31-2022, 03:48 PM
(This post was last modified: Dec-31-2022, 03:51 PM by KoinKoin.)
Wow thank you very much @ deanhystad !
I adapted it to my script and work great!
May I ask the difference between my code and yours ?
I feel like the step was the same, only the method was different.
I was using a df for my players list :
1 |
df1 = pd.DataFrame.from_dict(data)
|
You used a class :
1 |
players = [Player(i) for i in range ( 1 , 61 )]
|
Then, combinations of teams + score :
1 2 |
for index in list (combinations(df1.index, 5 )):
dictIndexScore[index] = df1.loc[index,:][ "Score1" ]. sum (axis = 0 )
|
Again you used class to do the same :
1 |
teams = [Team(pics) for pics in combinations(players, 5 )]
|
Finally to get the best team, i was sorting my dict and test the avg value:
1 2 3 4 5 |
for i in sorted (dictIndexScore.items(), key = lambda x:x[ 1 ], reverse = True ):
if df1.loc[i[ 0 ],:][ "10Aver1" ]. sum (axis = 0 ) - df1.loc[i[ 0 ],:][ "10Aver1" ]. max (axis = 0 ) < 121 :
bestIndex = i[ 0 ]
bestSum = i[ 1 ]
break
|
You looked for the max when avg is under 121:
1 2 |
teams = [team for team in teams if team.cap < = 120 ]
best_team = max (teams, key = lambda p: p.score)
|
Using class instead of df is the reason why it's faster ?
Posts: 6,824
Threads: 20
Joined: Feb 2020
You should time each of the steps to determine which is slow. Knowing exactly what is slow should help illuminate why.
Posts: 4,804
Threads: 77
Joined: Jan 2018
(Dec-31-2022, 04:25 PM)deanhystad Wrote: You should time each of the steps to determine which is slow. Knowing exactly what is slow should help illuminate why. Run the code with
Output: python -m cProfile myscript.py
Posts: 16
Threads: 2
Joined: Mar 2022
I already know that this code :
1 2 |
for index in list (combinations(df1.index, 5 )):
dictIndexScore[index] = df1.loc[index,:][ "Score1" ]. sum (axis = 0 )
|
is taking long time, I already saw that without cProfile, I used some "time() - start" to see that.
I was just curious to understand, because I never really used class and I wanted to know if it was the reason...
But huge thanks for your help.
Posts: 6,824
Threads: 20
Joined: Feb 2020
You should find out if making the combinations or computing the dcitIndexScore is taking all the time. My guess is the latter.
Classes are just a way to organize code. Classes are used to make code more compact, more flexible, and easier to understand. Classes usually don't have any speed of execution benefit. I could write the same code without classes and it would execute just as fast.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
from itertools import combinations
from random import randint
from time import time
def Player(name):
return (name, randint( 1 , 200 ), randint( 1 , 100 ))
def player_str(player):
return f '{player[0]} score={player[1]}, avg={player[2]}'
def Team(players):
return (players, sum (p[ 1 ] for p in players), sum (p[ 2 ] for p in players), - players[ 0 ][ 1 ])
def team_str(team):
players = '\n' .join(player_str(p) for p in team[ 0 ])
return f '{players}\nscore={team[1]}, cap={team[2]}\n'
start = time()
players = [Player(i) for i in range ( 1 , 61 )]
players = sorted (players, key = lambda p: p[ 1 ], reverse = True )
teams = [Team(pics) for pics in combinations(players, 5 )]
teams = [team for team in teams if team[ 2 ] < = 120 ]
best_team = max (teams, key = lambda t: t[ 1 ])
print (time() - start)
print (team_str(best_team))
|
Posts: 16
Threads: 2
Joined: Mar 2022
Jan-02-2023, 09:44 PM
(This post was last modified: Jan-02-2023, 09:44 PM by KoinKoin.)
I might be crazy asking that, but what if I have like 400 players and I need the best 5 ?
It's like 84 billions combinations possible...
I tried to play with that but I got a "Memory error" at some point"
Do you think it's a good option to go this way :
1 2 |
teams = [team for team in [Team(pics) for pics in combinations(players, 5 )] if team.cap < = 120 ]
best_team = max (teams, key = lambda p: p.score)
|
Posts: 6,824
Threads: 20
Joined: Feb 2020
Jan-04-2023, 06:15 AM
(This post was last modified: Jan-04-2023, 06:15 AM by deanhystad.)
If you want to do 400 players and not do anything to reduce cull the list (remove players with low scores and/or poor value) you can get a supercomputer.
Memory is not a problem. You are probably making a list to hold all the teams. Instead, you should do one team at a time. Itertools.combinations is a lazy iterator and generates combinations when requested.
Pseudo code:
1 2 3 4 5 6 |
best_team = None
for team in itertools.combinations(players, 5 ):
if team.cap_score < = 120 :
if best_team is None or best_team.score < team.score:
best_team = team
print (best_team)
|
That fixes the memory problems, but you still have the time problems. I was playing around measuring times for different numbers of players,
30 players = 0.27 seconds
40 players = 1.38 seconds
50 players = 4.52 seconds
60 players = 11.59 seconds
70 players = 26.55 seconds
400 players = 53 hours (estimated)
The easiest way to speed things up is reduce the number of players. Ignore players scoring under the league average and or players that have a low score/average value.
But even if you threw away 200 players, you would still have 200 players to evaluate. Extrapolating from my tests I would expect it to take about 1.6 hours to find the best team that could be made from a pool of 200 players.
You might be able to speed things up by reducing the player list as you are building the team. This is a recursive solution that skips players a team cannot afford. Depending on how early in the search you skip a player this can prevent having to test thousands of teams that don't fit under the cap.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
from random import randint
from time import time
class Player:
def __init__( self , id ):
self . id = id
self .avg = randint( 0 , 100 )
self .score = max ( 0 , self .avg + randint( - 50 , 50 ))
def __str__( self ):
return self .__repr__()
def __repr__( self ):
return f '{self.id} avg={self.avg} score={self.score}'
class Team:
def __init__( self ):
self ._players = []
self .score = 0
self .cap = 120
def __str__( self ):
return self .__repr__()
def __repr__( self ):
players = '\n' .join( str (p) for p in self ._players)
return f '{players}\ncap={self.cap} score={self.score}\n'
def push( self , player):
if self ._players:
self .cap - = player.avg
self .score + = player.score
self ._players.append(player)
def pop( self ):
player = self ._players.pop()
if self ._players:
self .cap + = player.avg
self .score - = player.score
def complete( self ):
return len ( self .players) > = 5
@property
def players( self ):
return self ._players
@players .setter
def players( self , roster):
self ._players[:] = roster
self .cap = 120 - sum (p.avg for p in roster[ 1 :])
self .score = sum (p.score for p in roster)
def get_best_team(players, index = 0 , team = None , best_team = None ):
if team is None :
team = Team()
best_team = Team()
get_best_team.counter = 0
if team.complete():
get_best_team.counter + = 1
if team.score > best_team.score:
best_team.players = team.players
return best_team
for i in range (index, len (players)):
if players[i].avg < = team.cap:
team.push(players[i])
get_best_team(players, i + 1 , team, best_team)
team.pop()
return best_team
players = sorted ([Player(i) for i in range ( 1 , 61 )], key = lambda p:p.avg, reverse = True )
start = time()
print (get_best_team(players))
print (time() - start, get_best_team.counter)
|
I ran some tests comparing computing combinations and using the recursive looping function. Time is in seconds.
Output: #Teams Comb Time Loop Time
40 1.38 0.11
50 4.52 0.44
60 11.59 0.75
70 26.55 1.75
80 67.20 3.58
Timing is less deterministic using this approach, but a very rough estimate for creating teams with a pool of 200 players is 6.3 minutes, and 3.5 hours for a pool of 400 players. When I wrote the program in C it takes under 5 minutes to solve for a pool of 400 players.
In case you were wondering how loops can replace combinations, just take a look at the combinations generated here:
1 2 |
from itertools import combinations
print ( list (combinations(( 1 , 2 , 3 , 4 ), 2 )))
|
Output: [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
And the output from this program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
def combinations(numbers, index = 0 , combo = None , combos = None ):
if combo is None :
combo = []
combos = []
if len (combo) > = 2 :
combos.append(combo.copy())
return combos
for i in range (index, len (numbers)):
combo.append(numbers[i])
combinations(numbers, i + 1 , combo, combos)
combo.pop()
return combos
print (combinations(( 1 , 2 , 3 , 4 )))
|
Output: [[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]
Posts: 16
Threads: 2
Joined: Mar 2022
Jan-04-2023, 09:44 AM
(This post was last modified: Jan-04-2023, 09:44 AM by KoinKoin.)
(Jan-04-2023, 06:15 AM)deanhystad Wrote: Timing is less deterministic using this approach, but a very rough estimate for creating teams with a pool of 200 players is 6.3 minutes, and 3.5 hours for a pool of 400 players. When I wrote the program in C it takes under 5 minutes to solve for a pool of 400 players. OMG ! That's impressive ! Guess I'll have to look for that langage I didn't really like and didn't do for more than 10 years hahaha
Did you kinda "convert" your last program (with the "get_best_team") to C or you used something like "Cython" (I just read about it, never heard of it before) ?
I'm really so impress by the time difference... 5 minutes versus houuuuuuurs
|