Python Forum

Full Version: General Coding help:Reinforcement learning
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
To implement Q-learning to solve the Taxi problem with optimal policy.The taxi problem source code is in https://github.com/openai/gym/blob/maste...xt/taxi.py

import gym import random import numpy import time
env = gym.make("Taxi-v2")
next_state = -1000*numpy.ones((501,6)) next_reward = -1000*numpy.ones((501,6))

#Training

Am new to Python, and I want to code this training part, Could someone help me with the code and its explanation so that my learning would be logical.


Thank you
Typically each square and each move is assigned a point value. If you make a move to a square with a good value, you increase the point value for the move and the square the move is from. You make tons of tries at the problem, keeping track of all the point values, and moving randomly, weighted by the point values. The more tries you make, the more your point values converge to the best path.