To implement Q-learning to solve the Taxi problem with optimal policy.The taxi problem source code is in https://github.com/openai/gym/blob/maste...xt/taxi.py
import gym import random import numpy import time
env = gym.make("Taxi-v2")
next_state = -1000*numpy.ones((501,6)) next_reward = -1000*numpy.ones((501,6))
#Training
Am new to Python, and I want to code this training part, Could someone help me with the code and its explanation so that my learning would be logical.
Thank you
import gym import random import numpy import time
env = gym.make("Taxi-v2")
next_state = -1000*numpy.ones((501,6)) next_reward = -1000*numpy.ones((501,6))
#Training
Am new to Python, and I want to code this training part, Could someone help me with the code and its explanation so that my learning would be logical.
Thank you