Apr-24-2022, 02:59 AM
As far as this:
I would like you to describe your Q learning algorithm. I cannot tell if it contains errors in logic or execution or my not understanding your requirements without knowing what you think it is supposed to do. For example, you always start at the same square but it appears that you want a solution for the entire map. I can design a map where that is not possible. It also seems a very inefficient way to generate a map, and even with 10000 iterations there remains a possibility that you do not try all routes. You calculate a total episode reward that is never used. You pass arguments to Q_learning() that are not used.
I think I implemented your algorithm and I get this result for inputs: 15 12 8 6 p
Quote:Can you elaborate on insert() to set move or values?What do you think list.insert(value) does? It is not the correct way to SET values in a list.
I would like you to describe your Q learning algorithm. I cannot tell if it contains errors in logic or execution or my not understanding your requirements without knowing what you think it is supposed to do. For example, you always start at the same square but it appears that you want a solution for the entire map. I can design a map where that is not possible. It also seems a very inefficient way to generate a map, and even with 10000 iterations there remains a possibility that you do not try all routes. You calculate a total episode reward that is never used. You pass arguments to Q_learning() that are not used.
I think I implemented your algorithm and I get this result for inputs: 15 12 8 6 p
Output:13 → 14 → 15 G 16 ←
9 ↑ 10 ↑ 11 ↑ 12 G
5 ↑ 6 W 7 ↑ 8 F
1 ↑ 2 → 3 ↑ 4 ←
I think this is correct. 16 cannot be up as you indicate in your expected results. 16 does not have an "Up" neighbor.