Python Forum
General Coding help:Reinforcement learning
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
General Coding help:Reinforcement learning
#1
To implement Q-learning to solve the Taxi problem with optimal policy.The taxi problem source code is in https://github.com/openai/gym/blob/maste...xt/taxi.py

import gym import random import numpy import time
env = gym.make("Taxi-v2")
next_state = -1000*numpy.ones((501,6)) next_reward = -1000*numpy.ones((501,6))

#Training

Am new to Python, and I want to code this training part, Could someone help me with the code and its explanation so that my learning would be logical.


Thank you
Reply
#2
Typically each square and each move is assigned a point value. If you make a move to a square with a good value, you increase the point value for the move and the square the move is from. You make tons of tries at the problem, keeping track of all the point values, and moving randomly, weighted by the point values. The more tries you make, the more your point values converge to the best path.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020