The most challenging open issues in sequential decision making include partial observability of the decision maker’s environment,
hierarchical and other types of abstract credit assignment, the learning of credit assignment algorithms, and exploration
without a priori world models. I will summarize why direct search (DS) in policy space provides a more natural framework for addressing these
issues than reinforcement learning (RL) based on value functions and dynamic programming. Then I will point out fundamental
drawbacks of traditional DS methods in case of stochastic environments, stochastic policies, and unknown temporal delays between
actions and observable effects. I will discuss a remedy called the success-story algorithm, show how it can outperform traditional
DS, and mention a relationship to market models combining certain aspects of DS and traditional RL.