Python Forum

To implement Q-learning to solve the Taxi problem with optimal policy.The taxi problem source code is in https://github.com/openai/gym/blob/maste...xt/taxi.py

import gym import random import numpy import time
env = gym.make("Taxi-v2")
next_state = -1000*numpy.ones((501,6)) next_reward = -1000*numpy.ones((501,6))

#Training

Am new to Python, and I want to code this training part, Could someone help me with the code and its explanation so that my learning would be logical.

Thank you

Typically each square and each move is assigned a point value. If you make a move to a square with a good value, you increase the point value for the move and the square the move is from. You make tons of tries at the problem, keeping track of all the point values, and moving randomly, weighted by the point values. The more tries you make, the more your point values converge to the best path.

kala

ichabod801