Loop to find the best combination/score

KoinKoin · (This post was last modified: Dec-30-2022, 09:45 PM by KoinKoin.)

Hello everyone,

I modified my code, it should be faster than the last one above, but still pretty long when I need to find 5 of 60...
Instead of comparing every combination like before, I putting in a dict index/score, then I sort by score and looking at the first indexes where sum-average) - sum(MVP) is under 121:

        
              df1 = pd.DataFrame.from_dict(data)
for index in list(combinations(df1.index,5)):
    dictIndexScore[index] = df1.loc[index,:]["Score1"].sum(axis=0)
for i in sorted(dictIndexScore.items(), key=lambda x:x[1], reverse=True):
    if df1.loc[i[0],:]["10Aver1"].sum(axis=0) - df1.loc[i[0],:]["10Aver1"].max(axis=0) < 121:
        bestIndex = i[0]
        bestSum = i[1]
        break
 
print(df1.loc[bestIndex,:])

My program did 3366.6575298309326 seconds for a 60 rows dataframe, so 5461512 combinations possible of 5 in 60...

Do you have any tips/hints to go faster ?
I read numpy is faster than dataframe, but didn't really work yet with it

**deanhystad** · (This post was last modified: Dec-31-2022, 04:47 AM by deanhystad.)

I would not do this in numpy. For what you are doing, numpy just slows things down.

I wrote a python program that made 60 random players and picked the best team using combinations of 5 players. It takes 14 seconds to run.

        
          
          
              
              from itertools import combinations
from random import randint
from time import time
 
class Player:
    def __init__(self, name):
        self.name = name
        self.avg = randint(1, 100)
        self.score = randint(1, 200)
 
    def __repr__(self):
        return f'{self.name} score={self.score}, avg={self.avg}'
 
 
class Team:
    def __init__(self, players):
        self.players = sorted(players, key=lambda x: x.score, reverse=True)
        self.score = sum(p.score for p in players)
        self.cap = sum(p.avg for p in players) - self.players[0].avg
 
    def __repr__(self):
        players = '\n'.join(str(p) for p in self.players)
        return f'{players}\nscore={self.score}, cap={self.cap}\n'
 
 
start = time()
players = [Player(i) for i in range(1, 61)]
teams = [Team(pics) for pics in combinations(players, 5)]
teams = [team for team in teams if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)
print(time() - start)
print(best_team)

            

        
      

A big time reduction was realized when I excluded players scoring below the league average. Now my runtime is under a second.

        
          
          
              
              from itertools import combinations
from random import randint
from time import time
 
class Player:
    def __init__(self, name):
        self.name = name
        self.avg = randint(1, 100)
        self.score = randint(1, 200)
 
    def __repr__(self):
        return f'{self.name} score={self.score}, avg={self.avg}'
 
 
class Team:
    def __init__(self, players):
        self.players = sorted(players, key=lambda x: x.score, reverse=True)
        self.score = sum(p.score for p in players)
        self.cap = sum(p.avg for p in players) - self.players[0].avg
 
    def __repr__(self):
        players = '\n'.join(str(p) for p in self.players)
        return f'{players}\nscore={self.score}, cap={self.cap}\n'
 
 
start = time()
players = [Player(i) for i in range(1, 61)]
avg_score = sum(p.score for p in players) / len(players)
players = [player for player in players if player.score > avg_score]
print(len(players))
teams = [team for team in teams if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)
print(time() - start)
print(best_team)

            

        
      

KoinKoin · (This post was last modified: Dec-31-2022, 03:51 PM by KoinKoin.)

Wow thank you very much @deanhystad !
I adapted it to my script and work great!

May I ask the difference between my code and yours ?
I feel like the step was the same, only the method was different.

I was using a df for my players list :

        
              df1 = pd.DataFrame.from_dict(data)

You used a class :

        
              players = [Player(i) for i in range(1, 61)]

Then, combinations of teams + score :

        
              for index in list(combinations(df1.index,5)):
    dictIndexScore[index] = df1.loc[index,:]["Score1"].sum(axis=0)

Again you used class to do the same :

        
              teams = [Team(pics) for pics in combinations(players, 5)]

Finally to get the best team, i was sorting my dict and test the avg value:

        
              for i in sorted(dictIndexScore.items(), key=lambda x:x[1], reverse=True):
    if df1.loc[i[0],:]["10Aver1"].sum(axis=0) - df1.loc[i[0],:]["10Aver1"].max(axis=0) < 121:
        bestIndex = i[0]
        bestSum = i[1]
        break

You looked for the max when avg is under 121:

        
              teams = [team for team in teams if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)

Using class instead of df is the reason why it's faster ?

**deanhystad** · Dec-31-2022, 04:25 PM

You should time each of the steps to determine which is slow. Knowing exactly what is slow should help illuminate why.

**Gribouillis** · Dec-31-2022, 04:44 PM

(Dec-31-2022, 04:25 PM)deanhystad Wrote: You should time each of the steps to determine which is slow. Knowing exactly what is slow should help illuminate why.

Run the code with

Output:
python -m cProfile myscript.py

KoinKoin · Dec-31-2022, 05:44 PM

I already know that this code :

        
              for index in list(combinations(df1.index,5)):
    dictIndexScore[index] = df1.loc[index,:]["Score1"].sum(axis=0)

is taking long time, I already saw that without cProfile, I used some "time() - start" to see that.

I was just curious to understand, because I never really used class and I wanted to know if it was the reason...
But huge thanks for your help.

**deanhystad** · Dec-31-2022, 06:49 PM

You should find out if making the combinations or computing the dcitIndexScore is taking all the time. My guess is the latter.

Classes are just a way to organize code. Classes are used to make code more compact, more flexible, and easier to understand. Classes usually don't have any speed of execution benefit. I could write the same code without classes and it would execute just as fast.

        
              from itertools import combinations
from random import randint
from time import time
 
def Player(name):
    return (name, randint(1, 200), randint(1, 100))
 
def player_str(player):
    return f'{player[0]} score={player[1]}, avg={player[2]}'
 
def Team(players):
    return (players, sum(p[1] for p in players), sum(p[2] for p in players), - players[0][1])
 
def team_str(team):
    players = '\n'.join(player_str(p) for p in team[0])
    return f'{players}\nscore={team[1]}, cap={team[2]}\n'
 
start = time()
players = [Player(i) for i in range(1, 61)]
players = sorted(players, key=lambda p: p[1], reverse=True)
teams = [Team(pics) for pics in combinations(players, 5)]
teams = [team for team in teams if team[2] <= 120]
best_team = max(teams, key=lambda t: t[1])
print(time() - start)
print(team_str(best_team))

KoinKoin · (This post was last modified: Jan-02-2023, 09:44 PM by KoinKoin.)

I might be crazy asking that, but what if I have like 400 players and I need the best 5 ?
It's like 84 billions combinations possible...
I tried to play with that but I got a "Memory error" at some point" Doh

Do you think it's a good option to go this way :

        
              teams = [team for team in [Team(pics) for pics in combinations(players, 5)] if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)

**deanhystad** · (This post was last modified: Jan-04-2023, 06:15 AM by deanhystad.)

If you want to do 400 players and not do anything to reduce cull the list (remove players with low scores and/or poor value) you can get a supercomputer.

Memory is not a problem. You are probably making a list to hold all the teams. Instead, you should do one team at a time. Itertools.combinations is a lazy iterator and generates combinations when requested.

Pseudo code:

        
              best_team = None
for team in itertools.combinations(players, 5):
    if team.cap_score <= 120:
        if best_team is None or best_team.score < team.score:
            best_team = team
print(best_team)

That fixes the memory problems, but you still have the time problems. I was playing around measuring times for different numbers of players,

30 players = 0.27 seconds
40 players = 1.38 seconds
50 players = 4.52 seconds
60 players = 11.59 seconds
70 players = 26.55 seconds
400 players = 53 hours (estimated)

The easiest way to speed things up is reduce the number of players. Ignore players scoring under the league average and or players that have a low score/average value.

But even if you threw away 200 players, you would still have 200 players to evaluate. Extrapolating from my tests I would expect it to take about 1.6 hours to find the best team that could be made from a pool of 200 players.

You might be able to speed things up by reducing the player list as you are building the team. This is a recursive solution that skips players a team cannot afford. Depending on how early in the search you skip a player this can prevent having to test thousands of teams that don't fit under the cap.

        
          
          
              
              from random import randint
from time import time
 
class Player:
    """Player info"""
    def __init__(self, id):
        self.id = id
        self.avg = randint(0, 100)
        self.score = max(0, self.avg +randint(-50, 50))
 
    def __str__(self):
        return self.__repr__()
 
    def __repr__(self):
        return f'{self.id} avg={self.avg} score={self.score}'
 
class Team:
    """A team with 5 players."""
    def __init__(self):
        self._players = []
        self.score = 0
        self.cap = 120
 
    def __str__(self):
        return self.__repr__()
 
    def __repr__(self):
        players = '\n'.join(str(p) for p in self._players)
        return f'{players}\ncap={self.cap} score={self.score}\n'
 
    def push(self, player):
        """Add player to the roster"""
        if self._players:
            self.cap -= player.avg
        self.score += player.score
        self._players.append(player)
 
    def pop(self):
        """Remove last player added to the roster"""
        player = self._players.pop()
        if self._players:
            self.cap += player.avg
        self.score -= player.score
 
    def complete(self):
        """Return True if team has all 5 players"""
        return len(self.players) >= 5
 
    @property
    def players(self):
        """Return list of players"""
        return self._players
 
    @players.setter
    def players(self, roster):
        """Set list of players"""
        self._players[:] = roster
        self.cap = 120 - sum(p.avg for p in roster[1:])
        self.score = sum(p.score for p in roster)
 
 
def get_best_team(players, index=0, team=None, best_team=None):
    """Find the best combination of 5 players that falls under the cap"""
    if team is None:
        team = Team()
        best_team = Team()
        get_best_team.counter = 0
     
    if team.complete():
        # Is this the best team so far?
        get_best_team.counter += 1
        if team.score > best_team.score:
            best_team.players = team.players
        return best_team
 
    for i in range(index, len(players)):
        if players[i].avg <= team.cap:
            team.push(players[i])
            get_best_team(players, i+1, team, best_team)
            team.pop()
    return best_team
 
# Sort players so most expensive  (avg) are first.
players = sorted([Player(i) for i in range(1, 61)], key=lambda p:p.avg, reverse=True)
start = time()
print(get_best_team(players))
print(time() - start, get_best_team.counter)

            

        
      

I ran some tests comparing computing combinations and using the recursive looping function. Time is in seconds.

Output:#Teams  Comb Time  Loop Time
    40       1.38       0.11
    50       4.52       0.44
    60      11.59       0.75
    70      26.55       1.75
    80      67.20       3.58

Timing is less deterministic using this approach, but a very rough estimate for creating teams with a pool of 200 players is 6.3 minutes, and 3.5 hours for a pool of 400 players. When I wrote the program in C it takes under 5 minutes to solve for a pool of 400 players.

In case you were wondering how loops can replace combinations, just take a look at the combinations generated here:

        
              from itertools import combinations
print(list(combinations((1, 2, 3, 4), 2)))

Output:
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]

And the output from this program:

        
              def combinations(numbers, index=0, combo=None, combos=None):
    if combo is None:
        combo = []
        combos = []
 
    if len(combo) >= 2:
        combos.append(combo.copy())
        return combos
 
    for i in range(index, len(numbers)):
        combo.append(numbers[i])
        combinations(numbers, i+1, combo, combos)
        combo.pop()
    return combos
 
print(combinations((1, 2, 3, 4)))

Output:
[[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]

KoinKoin · (This post was last modified: Jan-04-2023, 09:44 AM by KoinKoin.)

(Jan-04-2023, 06:15 AM)deanhystad Wrote: Timing is less deterministic using this approach, but a very rough estimate for creating teams with a pool of 200 players is 6.3 minutes, and 3.5 hours for a pool of 400 players. When I wrote the program in C it takes under 5 minutes to solve for a pool of 400 players.

OMG ! That's impressive ! Guess I'll have to look for that langage I didn't really like and didn't do for more than 10 years hahaha

Did you kinda "convert" your last program (with the "get_best_team") to C or you used something like "Cython" (I just read about it, never heard of it before) ?

I'm really so impress by the time difference... 5 minutes versus houuuuuuurs

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to update score value?	Mavoz	5	5,218	Nov-08-2022, 12:37 AM Last Post: Mavoz
	Algorithm to generate a combination	guvinicius2	5	3,786	Aug-15-2020, 10:42 PM Last Post: deanhystad
	loops in combination with lists	Juru	4	3,917	May-07-2020, 02:58 PM Last Post: Marbelous
	Average score	MartinEvtimov	5	8,614	Apr-02-2017, 07:35 PM Last Post: ichabod801

Loop to find the best combination/score

User Panel Messages

Announcements