Apr-25-2022, 01:54 AM
(This post was last modified: Apr-25-2022, 01:54 AM by deanhystad.)
I am stepping up from pretty sure to I know it is wrong. When you make a node the numbers in the list are the node numbers for the neighbors of the node. For example, Node 8 has the following neighbors:
up = 12, right = out of range, down = 4 left = 7
You correctly create node 8 like this:
You correctly create nodes 6, 7, 8, 10, 11, 12, 13 and 14 using this same pattern. Specifying the node number for each neighbor and using None when there is not a neighbor in that direction.
You incorrectly create nodes 1, 2, 3, 4, 5, 9, 15 and 16. In each of these nodes moving in a direction that takes you off the board (out of the environment) magically teleports you back to node 1
The error is not immediately obvious. Node 1 is tucked away in a corner and passing through node 1 usually isn't the shortest path to the goal, but look at what happens if I move the start point up to node 12.
Your program has a lot of errors. The errors in setting up the node neighbors are the worst. These can make your program return incorrect results.
You have other errors that don't affect results. This code works fine now, but would blow up if your environment was larger than 4 x 4:
You define these constants but they are never used
Q_learning computes a "total_episode_reward" that is never used for anything.
up = 12, right = out of range, down = 4 left = 7
You correctly create node 8 like this:
Node_8 = Node(8, [12, self.wall, 4, 7])Where self.wall is None.
You correctly create nodes 6, 7, 8, 10, 11, 12, 13 and 14 using this same pattern. Specifying the node number for each neighbor and using None when there is not a neighbor in that direction.
You incorrectly create nodes 1, 2, 3, 4, 5, 9, 15 and 16. In each of these nodes moving in a direction that takes you off the board (out of the environment) magically teleports you back to node 1
The error is not immediately obvious. Node 1 is tucked away in a corner and passing through node 1 usually isn't the shortest path to the goal, but look at what happens if I move the start point up to node 12.
Output:1 2 5 3 p
1 goal
2 goal
3 wall-square
4 right
5 forbid
6 down
7 left
8 down
9 left
10 up
11 up
12 up
13 up
14 up
15 up
16 up
Node 4 moves to the right to use the teleport pad to node 1. Nodes 10 through 16 also take advantage of the teleporter.Your program has a lot of errors. The errors in setting up the node neighbors are the worst. These can make your program return incorrect results.
You have other errors that don't affect results. This code works fine now, but would blow up if your environment was larger than 4 x 4:
position = 0 while position < LEVEL: if current_episode.next[position] is not None: current_episode.move.insert(position, Direction(position)), current_episode.qValues.insert( position, False) position += 1The loop should execute 4 times because there are 4 Directions, not because the environment is 4 nodes wide or 4 nodes high. The code should be this:
for direction in Direction: if current_episode.next[direction.value] is not None: current_episode.move[direction.value] = direction current_episode.qValues[direction.value] = FalseOr even better, make direction an IntEnum instead of an Enum. Now you can use Direction like it is an integer.
class Direction(enum.IntEnum): up = 0 right = 1 down = 2 left = 3 ... for direction in Direction: if current_episode.next[direction] is not None: current_episode.move[direction] = direction current_episode.qValues[direction] = FalseAnd you have bugs that will never cause a problem, but they do not leave a good impression with those reading your code.
You define these constants but they are never used
HEIGHT = 4 WEIGHT = 4You pass "print_best_actions" and "index" arguments to Q_learning but they are not used.
Q_learning computes a "total_episode_reward" that is never used for anything.