General Coding help:Reinforcement learning

kala · (This post was last modified: Oct-13-2018, 01:51 PM by buran.)

To implement Q-learning to solve the Taxi problem with optimal policy.The taxi problem source code is in https://github.com/openai/gym/blob/maste...xt/taxi.py

import gym import random import numpy import time
env = gym.make("Taxi-v2")
next_state = -1000*numpy.ones((501,6)) next_reward = -1000*numpy.ones((501,6))

#Training

Am new to Python, and I want to code this training part, Could someone help me with the code and its explanation so that my learning would be logical.

Thank you

General Coding help:Reinforcement learning

User Panel Messages

Announcements