Lecture Notes in Computer Science, 2001, Volume 1828/2001, 213-240, DOI: 10.1007/3-540-44565-X_10

Sequential Decision Making Based on Direct Search

Jürgen Schmidhuber

View Related Documents

Abstract

The most challenging open issues in sequential decision making include partial observability of the decision maker’s environment, hierarchical and other types of abstract credit assignment, the learning of credit assignment algorithms, and exploration without a priori world models. I will summarize why direct search (DS) in policy space provides a more natural framework for addressing these issues than reinforcement learning (RL) based on value functions and dynamic programming. Then I will point out fundamental drawbacks of traditional DS methods in case of stochastic environments, stochastic policies, and unknown temporal delays between actions and observable effects. I will discuss a remedy called the success-story algorithm, show how it can outperform traditional DS, and mention a relationship to market models combining certain aspects of DS and traditional RL.

Fulltext Preview

Image of the first page of the fulltext document