Python Forum
I want to create an Othello AI.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
I want to create an Othello AI.
#1
I have written an Othello game for my mother and would like to implement a single player mode. That means having an AI for the player to play against. All of my previous attempts to make a good Othello AI either used minimax or a series of positional score mapping and pattern checking heuristics, or a combination of the 2. The problem is the bot just wasn't smart enough.

After much research, I found that Othello, the game which "takes minutes to learn but a lifetime to master" is so for computers as it is for people. Even the first bot to win against a skilled Human player back in 1997 used machine learning neural networks, not minimax. The best performers have used NEAT, which is notoriously hard to implement, but I've decided to take a slightly different approach which I've dubbed Meme Evolution Learning (MEL.)

My decision to take this approach is based on 2 main factors: first, I think it will be at least a little easier to implement than NEAT, and second, I've seen no evidence of others using meme evolution for machine learning. If this works, I could publish a paper on the subject and MEL may have a great many applications for which RNNs and neuro-evolution based algorithms are ill suited. Moreover, the successful implementation of MEL could be evidence in support of meme theory.

The reason I'm posting here today is because I'm lost as to how best to implement crossover, as I was when I tried to implement NEAT and it's a problem which I've found extremely daunting. The way it's supposed to work is the training algorithm starts out with a fixed population of agents which each have a randomly generated meme which they execute. The meme in this implementation is represented by a list of tuples. Every first tuple is of the form (op, crossoverID). This is subject to change as I'm still unsure how to implement crossover. The op field is an index to a tuple of first class functions which implement all the functionally required for Turing completeness. Every second tuple contains the parameters to the functions which are called when the meme is executed.

When the meme runs, during training, the run function executes the meme and returns a tuple containing a Boolean flag indicating whether an error occurred and the agents output are returned to a scoring algorithm which then scores the agent based on the quality of its output with a penalty applied if exceptions were raised by an agent executing its meme.

The agents in the population then "converse," exchanging scoring data and memes between them. Crossover and mutation are applied to the memes and the poorest performing memes are discarded and replaced with better performing memes. Whenever a pre-set number of generations is reached or the meme within an agent reaches a max score, the training ends and Population.train returns the best performing meme so it can be saved to a file, loaded from file, passed into an Agents constructor, and used to perform the task for which the meme was generated. As a final note, infinite loops and infinite recursion are prevented by limiting the length of the callStack instance variable and setting a time limit on the agent.

most of this hasn't yet been implemented but I can show what of my code I've written so far. So as to avoid filling this module (which I may make available for download when it's complete) with unnecessary code, I want to be sure that all the implementation details are worked out to the best of my ability.

my code:
"""
This module implements all the functionality necessary to develop machine learning algorithms based on the propagation of memes within a population. The memes mutate as they are passed from one agent to another and are tested by using the scoring function supplied in the argument list of the population instance.

The memes are encoded as lists; each containing a sequence of tuples where every first tuple contains an index to a tuple of first-class functions in the first index and a crossover ID used by the genetic algorithm in the second index and every second tuple is an arg list containing all the arguments; all integers and floating-point values. The genetic algorithm includes any and all errors generated by the memes being tested in the score calculations used by the selection algorithm. It then applies mutation and crossover to the memes and tests them again by running them in instances of Agent. This process is repeated until either the preset number of generations is reached or an individual meme reaches the preset maximum score.
"""
import random

class Agent:
    """
    memeLearning.Agent: class
    This class is used to create
    """
    def __init__ (self, inputSize, outputSize, existingMeme = None, nextAvailableCrossoverID = 0, memSize = 65536):
        """
        Agent.__init__(self, existingMeme)
        This method initializes an instance of Agent.

        The inputSize and output size parameters are used to specify the number of elements in an input and the number of elements in an output. The input is always a list of floating-point values and the output a list of floating-point values. The genetic algorithm ensures that the meme which is generated interprets the inputs correctly and generates the correct outputs.

        The parameter existingMeme is used to give the Agent a pre-generated behavior; for instance, when the algorithm has been trained and a meme is loaded from a file to be executed by an instance of Agent in the application phase (after training.) If this parameter is None, the meme executed by this agent is initially set to a randomly generated sequence of behaviors.

        The nextAvailableCrossoverID parameter is used by the genetic algorithm to ensure that all genes have unique crossover IDs.

        The memSize parameter is used to control how much memory the agent has available to it. The default is 65 kb.
        """
        self.inputSize = inputSize
        self.inputData = [0.0 for i in range(self.inputSize)]
        self.outputSize = outputSize
        self.output = [0.0 for i in range(self.outputSize)]
        self.nextAvailableCrossoverID = nextAvailableCrossoverID

        # All of the low-level behaviors of which the agent is capable stored in a tuple.
        # The activate method is used to execute These behaviors in the order and with the
        # parameters specified in the meme. There is no fixed word size since python
        # expanding floats are used.
        self.behaviors = (
                          # All of the arithmetic and logic behaviors.
                          self.bExit,
                          self.bAdd,
                          self.bSub,
                          self.bMul,
                          self.bDiv,
                          self.bExp,
                          self.bCmp,
                          self.bAnd,
                          self.bOr,
                          self.bNot,
                          self.bXor,

                          # All flow-control behaviors:
                          self.bJfl,    # Jump if the flag is True.
                          self.bSfl,    # Jump of the flag is False. (Stay if Flag)
                          self.bCall,   # Call a function.
                          self.bRet,    # Return from a function.

                          # Input and output:
                          # The input is made read-only to prevent it from being altered in a way
                          # that renders it useless to other behaviors in the meme.
                          self.bGetInput,
                          self.bSetOutput,

                          # A behavior that enables the agent to store literals in its memory:
                          self.bLoadLiteral
                          )

        self.callStack = []
        self.ip = 0
        self.lastArithmeticResult = 0.0
        self.memSize = memSize
        self.memory = [0.0 for i in range(self.memSize)]
        self.isActing = False

        if existingMeme is None:
            self.meme = self.generateNewRandomMeme(5)
        else:
            self.meme = existingMeme

    def generateNewRandomMeme (self, num_behaviors):
        # For each random behavior we want to add, we use the attribute __defaults__ to get a tuple of all the default parameters,
        # create an empty list, iterate over the tuple, and use the data types of the defaults to fill the empty parameter list with
        # parameters of the appropriate type.
        aMeme = []
        for i in range(num_behaviors):
            aMeme.append( (random.randint(0, len(self.behaviors) - 1), self.nextAvailableCrossoverID) )
            argList = []
            for d in self.behaviors[aMeme[-1][0]].__defaults__:
                if type(d) is int:
                    argList.append(random.randint(0, self.memSize))

                elif type(d) is float:
                    argList.append(random.uniform(-2**32 + 2, 2**32 - 1))

            argList.append(tuple(argList))

        return aMeme

    # Implementations of all the behaviors of which the agent is capable:
    
I've never done anything machine-learning related so any advice you can give me will be greatly appreciated. Appologies for the really long comments and docstrings.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020