Python Forum

Full Version: Loop to find the best combination/score
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
Hello everyone,

I modified my code, it should be faster than the last one above, but still pretty long when I need to find 5 of 60...
Instead of comparing every combination like before, I putting in a dict index/score, then I sort by score and looking at the first indexes where sum-average) - sum(MVP) is under 121:
    df1 = pd.DataFrame.from_dict(data)
    for index in list(combinations(df1.index,5)):
        dictIndexScore[index] = df1.loc[index,:]["Score1"].sum(axis=0)
    for i in sorted(dictIndexScore.items(), key=lambda x:x[1], reverse=True):
        if df1.loc[i[0],:]["10Aver1"].sum(axis=0) - df1.loc[i[0],:]["10Aver1"].max(axis=0) < 121:
            bestIndex = i[0]
            bestSum = i[1]
            break

    print(df1.loc[bestIndex,:])
My program did 3366.6575298309326 seconds for a 60 rows dataframe, so 5461512 combinations possible of 5 in 60...

Do you have any tips/hints to go faster ?
I read numpy is faster than dataframe, but didn't really work yet with it
I would not do this in numpy. For what you are doing, numpy just slows things down.

I wrote a python program that made 60 random players and picked the best team using combinations of 5 players. It takes 14 seconds to run.
from itertools import combinations
from random import randint
from time import time

class Player:
    def __init__(self, name):
        self.name = name
        self.avg = randint(1, 100)
        self.score = randint(1, 200)

    def __repr__(self):
        return f'{self.name} score={self.score}, avg={self.avg}'


class Team:
    def __init__(self, players):
        self.players = sorted(players, key=lambda x: x.score, reverse=True)
        self.score = sum(p.score for p in players)
        self.cap = sum(p.avg for p in players) - self.players[0].avg

    def __repr__(self):
        players = '\n'.join(str(p) for p in self.players)
        return f'{players}\nscore={self.score}, cap={self.cap}\n'


start = time()
players = [Player(i) for i in range(1, 61)]
teams = [Team(pics) for pics in combinations(players, 5)]
teams = [team for team in teams if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)
print(time() - start)
print(best_team)
A big time reduction was realized when I excluded players scoring below the league average. Now my runtime is under a second.
from itertools import combinations
from random import randint
from time import time

class Player:
    def __init__(self, name):
        self.name = name
        self.avg = randint(1, 100)
        self.score = randint(1, 200)

    def __repr__(self):
        return f'{self.name} score={self.score}, avg={self.avg}'


class Team:
    def __init__(self, players):
        self.players = sorted(players, key=lambda x: x.score, reverse=True)
        self.score = sum(p.score for p in players)
        self.cap = sum(p.avg for p in players) - self.players[0].avg

    def __repr__(self):
        players = '\n'.join(str(p) for p in self.players)
        return f'{players}\nscore={self.score}, cap={self.cap}\n'


start = time()
players = [Player(i) for i in range(1, 61)]
avg_score = sum(p.score for p in players) / len(players)
players = [player for player in players if player.score > avg_score]
print(len(players))
teams = [team for team in teams if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)
print(time() - start)
print(best_team)
Wow thank you very much @deanhystad !
I adapted it to my script and work great!

May I ask the difference between my code and yours ?
I feel like the step was the same, only the method was different.

I was using a df for my players list :
df1 = pd.DataFrame.from_dict(data)
You used a class :
players = [Player(i) for i in range(1, 61)]
Then, combinations of teams + score :
for index in list(combinations(df1.index,5)):
    dictIndexScore[index] = df1.loc[index,:]["Score1"].sum(axis=0)
Again you used class to do the same :
teams = [Team(pics) for pics in combinations(players, 5)]
Finally to get the best team, i was sorting my dict and test the avg value:
for i in sorted(dictIndexScore.items(), key=lambda x:x[1], reverse=True):
    if df1.loc[i[0],:]["10Aver1"].sum(axis=0) - df1.loc[i[0],:]["10Aver1"].max(axis=0) < 121:
        bestIndex = i[0]
        bestSum = i[1]
        break
You looked for the max when avg is under 121:
teams = [team for team in teams if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)
Using class instead of df is the reason why it's faster ?
You should time each of the steps to determine which is slow. Knowing exactly what is slow should help illuminate why.
(Dec-31-2022, 04:25 PM)deanhystad Wrote: [ -> ]You should time each of the steps to determine which is slow. Knowing exactly what is slow should help illuminate why.
Run the code with
Output:
python -m cProfile myscript.py
I already know that this code :
for index in list(combinations(df1.index,5)):
    dictIndexScore[index] = df1.loc[index,:]["Score1"].sum(axis=0)
is taking long time, I already saw that without cProfile, I used some "time() - start" to see that.

I was just curious to understand, because I never really used class and I wanted to know if it was the reason...
But huge thanks for your help.
You should find out if making the combinations or computing the dcitIndexScore is taking all the time. My guess is the latter.

Classes are just a way to organize code. Classes are used to make code more compact, more flexible, and easier to understand. Classes usually don't have any speed of execution benefit. I could write the same code without classes and it would execute just as fast.
from itertools import combinations
from random import randint
from time import time

def Player(name):
    return (name, randint(1, 200), randint(1, 100))

def player_str(player):
    return f'{player[0]} score={player[1]}, avg={player[2]}'

def Team(players):
    return (players, sum(p[1] for p in players), sum(p[2] for p in players), - players[0][1])

def team_str(team):
    players = '\n'.join(player_str(p) for p in team[0])
    return f'{players}\nscore={team[1]}, cap={team[2]}\n'

start = time()
players = [Player(i) for i in range(1, 61)]
players = sorted(players, key=lambda p: p[1], reverse=True)
teams = [Team(pics) for pics in combinations(players, 5)]
teams = [team for team in teams if team[2] <= 120]
best_team = max(teams, key=lambda t: t[1])
print(time() - start)
print(team_str(best_team))
I might be crazy asking that, but what if I have like 400 players and I need the best 5 ?
It's like 84 billions combinations possible...
I tried to play with that but I got a "Memory error" at some point" Doh

Do you think it's a good option to go this way :
teams = [team for team in [Team(pics) for pics in combinations(players, 5)] if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)
If you want to do 400 players and not do anything to reduce cull the list (remove players with low scores and/or poor value) you can get a supercomputer.

Memory is not a problem. You are probably making a list to hold all the teams. Instead, you should do one team at a time. Itertools.combinations is a lazy iterator and generates combinations when requested.

Pseudo code:
best_team = None
for team in itertools.combinations(players, 5):
    if team.cap_score <= 120:
        if best_team is None or best_team.score < team.score:
            best_team = team
print(best_team)
That fixes the memory problems, but you still have the time problems. I was playing around measuring times for different numbers of players,

30 players = 0.27 seconds
40 players = 1.38 seconds
50 players = 4.52 seconds
60 players = 11.59 seconds
70 players = 26.55 seconds
400 players = 53 hours (estimated)

The easiest way to speed things up is reduce the number of players. Ignore players scoring under the league average and or players that have a low score/average value.

But even if you threw away 200 players, you would still have 200 players to evaluate. Extrapolating from my tests I would expect it to take about 1.6 hours to find the best team that could be made from a pool of 200 players.

You might be able to speed things up by reducing the player list as you are building the team. This is a recursive solution that skips players a team cannot afford. Depending on how early in the search you skip a player this can prevent having to test thousands of teams that don't fit under the cap.
from random import randint
from time import time

class Player:
    """Player info"""
    def __init__(self, id):
        self.id = id
        self.avg = randint(0, 100)
        self.score = max(0, self.avg +randint(-50, 50))

    def __str__(self):
        return self.__repr__()

    def __repr__(self):
        return f'{self.id} avg={self.avg} score={self.score}'

class Team:
    """A team with 5 players."""
    def __init__(self):
        self._players = []
        self.score = 0
        self.cap = 120

    def __str__(self):
        return self.__repr__()

    def __repr__(self):
        players = '\n'.join(str(p) for p in self._players)
        return f'{players}\ncap={self.cap} score={self.score}\n'

    def push(self, player):
        """Add player to the roster"""
        if self._players:
            self.cap -= player.avg
        self.score += player.score
        self._players.append(player)

    def pop(self):
        """Remove last player added to the roster"""
        player = self._players.pop()
        if self._players:
            self.cap += player.avg
        self.score -= player.score

    def complete(self):
        """Return True if team has all 5 players"""
        return len(self.players) >= 5

    @property
    def players(self):
        """Return list of players"""
        return self._players

    @players.setter
    def players(self, roster):
        """Set list of players"""
        self._players[:] = roster
        self.cap = 120 - sum(p.avg for p in roster[1:])
        self.score = sum(p.score for p in roster)


def get_best_team(players, index=0, team=None, best_team=None):
    """Find the best combination of 5 players that falls under the cap"""
    if team is None:
        team = Team()
        best_team = Team()
        get_best_team.counter = 0
    
    if team.complete():
        # Is this the best team so far?
        get_best_team.counter += 1
        if team.score > best_team.score:
            best_team.players = team.players
        return best_team

    for i in range(index, len(players)):
        if players[i].avg <= team.cap:
            team.push(players[i])
            get_best_team(players, i+1, team, best_team)
            team.pop()
    return best_team

# Sort players so most expensive  (avg) are first.
players = sorted([Player(i) for i in range(1, 61)], key=lambda p:p.avg, reverse=True)
start = time()
print(get_best_team(players))
print(time() - start, get_best_team.counter)
I ran some tests comparing computing combinations and using the recursive looping function. Time is in seconds.
Output:
#Teams Comb Time Loop Time 40 1.38 0.11 50 4.52 0.44 60 11.59 0.75 70 26.55 1.75 80 67.20 3.58
Timing is less deterministic using this approach, but a very rough estimate for creating teams with a pool of 200 players is 6.3 minutes, and 3.5 hours for a pool of 400 players. When I wrote the program in C it takes under 5 minutes to solve for a pool of 400 players.

In case you were wondering how loops can replace combinations, just take a look at the combinations generated here:
from itertools import combinations
print(list(combinations((1, 2, 3, 4), 2)))
Output:
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
And the output from this program:
def combinations(numbers, index=0, combo=None, combos=None):
    if combo is None:
        combo = []
        combos = []

    if len(combo) >= 2:
        combos.append(combo.copy())
        return combos

    for i in range(index, len(numbers)):
        combo.append(numbers[i])
        combinations(numbers, i+1, combo, combos)
        combo.pop()
    return combos

print(combinations((1, 2, 3, 4)))
Output:
[[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]
(Jan-04-2023, 06:15 AM)deanhystad Wrote: [ -> ]Timing is less deterministic using this approach, but a very rough estimate for creating teams with a pool of 200 players is 6.3 minutes, and 3.5 hours for a pool of 400 players. When I wrote the program in C it takes under 5 minutes to solve for a pool of 400 players.
OMG ! That's impressive ! Guess I'll have to look for that langage I didn't really like and didn't do for more than 10 years hahaha

Did you kinda "convert" your last program (with the "get_best_team") to C or you used something like "Cython" (I just read about it, never heard of it before) ?

I'm really so impress by the time difference... 5 minutes versus houuuuuuurs
Pages: 1 2 3