View Related Documents

Abstract

To solve the problem of tradeoff between exploration and exploitation actions in reinforcement learning, the authors have proposed two-dimensional evaluation reinforcement learning, which distinguishes between reward and punishment evaluation forecasts. The proposed method use the difference between reward evaluation and punishment evaluation as a factor for determining the action and the sum as a parameter for determining the ratio of exploration to exploitation. In this paper we described an experiment with a mobile robot searching for a path and the subsequent conflict between exploration and exploitation actions. The results of the experiment prove that using the proposed method of reinforcement learning using the tw o dimensions of reward and punishment can generate a better path than using the conventional reinforcement learning method.

Fulltext Preview

Image of the first page of the fulltext document