To solve the problem of tradeoff between exploration and exploitation actions in reinforcement learning, the authors have
proposed two-dimensional evaluation reinforcement learning, which distinguishes between reward and punishment evaluation forecasts.
The proposed method use the difference between reward evaluation and punishment evaluation as a factor for determining the
action and the sum as a parameter for determining the ratio of exploration to exploitation. In this paper we described an
experiment with a mobile robot searching for a path and the subsequent conflict between exploration and exploitation actions.
The results of the experiment prove that using the proposed method of reinforcement learning using the tw o dimensions of
reward and punishment can generate a better path than using the conventional reinforcement learning method.