Python Forum
Loop to find the best combination/score
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Loop to find the best combination/score
#11
Hello everyone,

I modified my code, it should be faster than the last one above, but still pretty long when I need to find 5 of 60...
Instead of comparing every combination like before, I putting in a dict index/score, then I sort by score and looking at the first indexes where sum-average) - sum(MVP) is under 121:
    df1 = pd.DataFrame.from_dict(data)
    for index in list(combinations(df1.index,5)):
        dictIndexScore[index] = df1.loc[index,:]["Score1"].sum(axis=0)
    for i in sorted(dictIndexScore.items(), key=lambda x:x[1], reverse=True):
        if df1.loc[i[0],:]["10Aver1"].sum(axis=0) - df1.loc[i[0],:]["10Aver1"].max(axis=0) < 121:
            bestIndex = i[0]
            bestSum = i[1]
            break

    print(df1.loc[bestIndex,:])
My program did 3366.6575298309326 seconds for a 60 rows dataframe, so 5461512 combinations possible of 5 in 60...

Do you have any tips/hints to go faster ?
I read numpy is faster than dataframe, but didn't really work yet with it
Reply
#12
I would not do this in numpy. For what you are doing, numpy just slows things down.

I wrote a python program that made 60 random players and picked the best team using combinations of 5 players. It takes 14 seconds to run.
from itertools import combinations
from random import randint
from time import time

class Player:
    def __init__(self, name):
        self.name = name
        self.avg = randint(1, 100)
        self.score = randint(1, 200)

    def __repr__(self):
        return f'{self.name} score={self.score}, avg={self.avg}'


class Team:
    def __init__(self, players):
        self.players = sorted(players, key=lambda x: x.score, reverse=True)
        self.score = sum(p.score for p in players)
        self.cap = sum(p.avg for p in players) - self.players[0].avg

    def __repr__(self):
        players = '\n'.join(str(p) for p in self.players)
        return f'{players}\nscore={self.score}, cap={self.cap}\n'


start = time()
players = [Player(i) for i in range(1, 61)]
teams = [Team(pics) for pics in combinations(players, 5)]
teams = [team for team in teams if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)
print(time() - start)
print(best_team)
A big time reduction was realized when I excluded players scoring below the league average. Now my runtime is under a second.
from itertools import combinations
from random import randint
from time import time

class Player:
    def __init__(self, name):
        self.name = name
        self.avg = randint(1, 100)
        self.score = randint(1, 200)

    def __repr__(self):
        return f'{self.name} score={self.score}, avg={self.avg}'


class Team:
    def __init__(self, players):
        self.players = sorted(players, key=lambda x: x.score, reverse=True)
        self.score = sum(p.score for p in players)
        self.cap = sum(p.avg for p in players) - self.players[0].avg

    def __repr__(self):
        players = '\n'.join(str(p) for p in self.players)
        return f'{players}\nscore={self.score}, cap={self.cap}\n'


start = time()
players = [Player(i) for i in range(1, 61)]
avg_score = sum(p.score for p in players) / len(players)
players = [player for player in players if player.score > avg_score]
print(len(players))
teams = [team for team in teams if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)
print(time() - start)
print(best_team)
Reply
#13
Wow thank you very much @deanhystad !
I adapted it to my script and work great!

May I ask the difference between my code and yours ?
I feel like the step was the same, only the method was different.

I was using a df for my players list :
df1 = pd.DataFrame.from_dict(data)
You used a class :
players = [Player(i) for i in range(1, 61)]
Then, combinations of teams + score :
for index in list(combinations(df1.index,5)):
    dictIndexScore[index] = df1.loc[index,:]["Score1"].sum(axis=0)
Again you used class to do the same :
teams = [Team(pics) for pics in combinations(players, 5)]
Finally to get the best team, i was sorting my dict and test the avg value:
for i in sorted(dictIndexScore.items(), key=lambda x:x[1], reverse=True):
    if df1.loc[i[0],:]["10Aver1"].sum(axis=0) - df1.loc[i[0],:]["10Aver1"].max(axis=0) < 121:
        bestIndex = i[0]
        bestSum = i[1]
        break
You looked for the max when avg is under 121:
teams = [team for team in teams if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)
Using class instead of df is the reason why it's faster ?
Reply
#14
You should time each of the steps to determine which is slow. Knowing exactly what is slow should help illuminate why.
Reply
#15
(Dec-31-2022, 04:25 PM)deanhystad Wrote: You should time each of the steps to determine which is slow. Knowing exactly what is slow should help illuminate why.
Run the code with
Output:
python -m cProfile myscript.py
Reply
#16
I already know that this code :
for index in list(combinations(df1.index,5)):
    dictIndexScore[index] = df1.loc[index,:]["Score1"].sum(axis=0)
is taking long time, I already saw that without cProfile, I used some "time() - start" to see that.

I was just curious to understand, because I never really used class and I wanted to know if it was the reason...
But huge thanks for your help.
Reply
#17
You should find out if making the combinations or computing the dcitIndexScore is taking all the time. My guess is the latter.

Classes are just a way to organize code. Classes are used to make code more compact, more flexible, and easier to understand. Classes usually don't have any speed of execution benefit. I could write the same code without classes and it would execute just as fast.
from itertools import combinations
from random import randint
from time import time

def Player(name):
    return (name, randint(1, 200), randint(1, 100))

def player_str(player):
    return f'{player[0]} score={player[1]}, avg={player[2]}'

def Team(players):
    return (players, sum(p[1] for p in players), sum(p[2] for p in players), - players[0][1])

def team_str(team):
    players = '\n'.join(player_str(p) for p in team[0])
    return f'{players}\nscore={team[1]}, cap={team[2]}\n'

start = time()
players = [Player(i) for i in range(1, 61)]
players = sorted(players, key=lambda p: p[1], reverse=True)
teams = [Team(pics) for pics in combinations(players, 5)]
teams = [team for team in teams if team[2] <= 120]
best_team = max(teams, key=lambda t: t[1])
print(time() - start)
print(team_str(best_team))
Reply
#18
I might be crazy asking that, but what if I have like 400 players and I need the best 5 ?
It's like 84 billions combinations possible...
I tried to play with that but I got a "Memory error" at some point" Doh

Do you think it's a good option to go this way :
teams = [team for team in [Team(pics) for pics in combinations(players, 5)] if team.cap <= 120]
best_team = max(teams, key=lambda p: p.score)
Reply
#19
If you want to do 400 players and not do anything to reduce cull the list (remove players with low scores and/or poor value) you can get a supercomputer.

Memory is not a problem. You are probably making a list to hold all the teams. Instead, you should do one team at a time. Itertools.combinations is a lazy iterator and generates combinations when requested.

Pseudo code:
best_team = None
for team in itertools.combinations(players, 5):
    if team.cap_score <= 120:
        if best_team is None or best_team.score < team.score:
            best_team = team
print(best_team)
That fixes the memory problems, but you still have the time problems. I was playing around measuring times for different numbers of players,

30 players = 0.27 seconds
40 players = 1.38 seconds
50 players = 4.52 seconds
60 players = 11.59 seconds
70 players = 26.55 seconds
400 players = 53 hours (estimated)

The easiest way to speed things up is reduce the number of players. Ignore players scoring under the league average and or players that have a low score/average value.

But even if you threw away 200 players, you would still have 200 players to evaluate. Extrapolating from my tests I would expect it to take about 1.6 hours to find the best team that could be made from a pool of 200 players.

You might be able to speed things up by reducing the player list as you are building the team. This is a recursive solution that skips players a team cannot afford. Depending on how early in the search you skip a player this can prevent having to test thousands of teams that don't fit under the cap.
from random import randint
from time import time

class Player:
    """Player info"""
    def __init__(self, id):
        self.id = id
        self.avg = randint(0, 100)
        self.score = max(0, self.avg +randint(-50, 50))

    def __str__(self):
        return self.__repr__()

    def __repr__(self):
        return f'{self.id} avg={self.avg} score={self.score}'

class Team:
    """A team with 5 players."""
    def __init__(self):
        self._players = []
        self.score = 0
        self.cap = 120

    def __str__(self):
        return self.__repr__()

    def __repr__(self):
        players = '\n'.join(str(p) for p in self._players)
        return f'{players}\ncap={self.cap} score={self.score}\n'

    def push(self, player):
        """Add player to the roster"""
        if self._players:
            self.cap -= player.avg
        self.score += player.score
        self._players.append(player)

    def pop(self):
        """Remove last player added to the roster"""
        player = self._players.pop()
        if self._players:
            self.cap += player.avg
        self.score -= player.score

    def complete(self):
        """Return True if team has all 5 players"""
        return len(self.players) >= 5

    @property
    def players(self):
        """Return list of players"""
        return self._players

    @players.setter
    def players(self, roster):
        """Set list of players"""
        self._players[:] = roster
        self.cap = 120 - sum(p.avg for p in roster[1:])
        self.score = sum(p.score for p in roster)


def get_best_team(players, index=0, team=None, best_team=None):
    """Find the best combination of 5 players that falls under the cap"""
    if team is None:
        team = Team()
        best_team = Team()
        get_best_team.counter = 0
    
    if team.complete():
        # Is this the best team so far?
        get_best_team.counter += 1
        if team.score > best_team.score:
            best_team.players = team.players
        return best_team

    for i in range(index, len(players)):
        if players[i].avg <= team.cap:
            team.push(players[i])
            get_best_team(players, i+1, team, best_team)
            team.pop()
    return best_team

# Sort players so most expensive  (avg) are first.
players = sorted([Player(i) for i in range(1, 61)], key=lambda p:p.avg, reverse=True)
start = time()
print(get_best_team(players))
print(time() - start, get_best_team.counter)
I ran some tests comparing computing combinations and using the recursive looping function. Time is in seconds.
Output:
#Teams Comb Time Loop Time 40 1.38 0.11 50 4.52 0.44 60 11.59 0.75 70 26.55 1.75 80 67.20 3.58
Timing is less deterministic using this approach, but a very rough estimate for creating teams with a pool of 200 players is 6.3 minutes, and 3.5 hours for a pool of 400 players. When I wrote the program in C it takes under 5 minutes to solve for a pool of 400 players.

In case you were wondering how loops can replace combinations, just take a look at the combinations generated here:
from itertools import combinations
print(list(combinations((1, 2, 3, 4), 2)))
Output:
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
And the output from this program:
def combinations(numbers, index=0, combo=None, combos=None):
    if combo is None:
        combo = []
        combos = []

    if len(combo) >= 2:
        combos.append(combo.copy())
        return combos

    for i in range(index, len(numbers)):
        combo.append(numbers[i])
        combinations(numbers, i+1, combo, combos)
        combo.pop()
    return combos

print(combinations((1, 2, 3, 4)))
Output:
[[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]
Reply
#20
(Jan-04-2023, 06:15 AM)deanhystad Wrote: Timing is less deterministic using this approach, but a very rough estimate for creating teams with a pool of 200 players is 6.3 minutes, and 3.5 hours for a pool of 400 players. When I wrote the program in C it takes under 5 minutes to solve for a pool of 400 players.
OMG ! That's impressive ! Guess I'll have to look for that langage I didn't really like and didn't do for more than 10 years hahaha

Did you kinda "convert" your last program (with the "get_best_team") to C or you used something like "Cython" (I just read about it, never heard of it before) ?

I'm really so impress by the time difference... 5 minutes versus houuuuuuurs
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to update score value? Mavoz 5 2,390 Nov-08-2022, 12:37 AM
Last Post: Mavoz
  Algorithm to generate a combination guvinicius2 5 2,483 Aug-15-2020, 10:42 PM
Last Post: deanhystad
  loops in combination with lists Juru 4 2,758 May-07-2020, 02:58 PM
Last Post: Marbelous
  Average score MartinEvtimov 5 6,717 Apr-02-2017, 07:35 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020