We describe a general method to transform a non-markovian sequential decision problem into a supervised learning problem using
a K-best-paths algorithm. We consider an application in financial portfolio management where we can train a controller to directly
optimize a Sharpe Ratio (or other risk-averse non-additive) utility function. We illustrate the approach by demonstrating
experimental results using a kernel-based controller architecture that would not normally be considered in traditional reinforcement
learning or approximate dynamic programming.