Jan-06-2019, 07:30 PM
I am working on my first project with machine learning and training AI with rewards an loses on a game of Tic Tac Toe. I have everything but this problem solved on paper. I can't figure out how I can add the AI's moves to an array and give them a reward.
The way I wanted to do this was use a huge array that stores all the information in a format of whether they started or their opponent started (1 or 2) then they move number (1, 2, 3, 4, and sometimes 5) and then add which square they chose and at the end either add 1 for a win or subtract 1 for a loss. This is how I thought I should format my code.
The way I wanted to do this was use a huge array that stores all the information in a format of whether they started or their opponent started (1 or 2) then they move number (1, 2, 3, 4, and sometimes 5) and then add which square they chose and at the end either add 1 for a win or subtract 1 for a loss. This is how I thought I should format my code.
#1, 2, 3 #4, 5, 6 #7, 8, 9 oBoard = [] #Sets 1 for when they start and 2 for when opponent starts for sTurnNum in range(2): oBoard.append([sTurnNum+1]) for turnNum in range(5): #Sets an empty set for every possible turn 1-5 oBoard[sTurnNum].append([turnNum+1])When I print this I get
Output:[[1, [1], [2], [3], [4], [5]], [2, [1], [2], [3], [4], [5]]]
and I can add what block they choose. If I simulate the game I get where the bot starts first:Output:[[1, [1, [5]], [2], [3], [4], [5]]]
Bot chooses space 5, player chooses space 1
[[1, [1, [5]], [2, [5, [7]]], [3], [4], [5]]]
Bot chooses space 7, player chooses space 8
[[1, [1, [5]], [2, [5, [7]]], [3, [5, [7, [3]]]], [4], [5]]]
Bot gets 3 in a row, 3-5-7. Set of moves should get rewarded 1 so
[[1, [1, [5, [total="1"]], [2, [5, [7, [total="1"]]]], [3, [5, [7, [3, [total="1"]]]]], [4], [5]]]
Then be able to get that total to see if the bot should pick that space if it is a good total or not once it gets past its learning trail