Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Open Theoretical Questions in Reinforcement Learning

Richard S. SuttonContact Information

(3)  AT&T Labs, Florham Park, NJ 07932, USA
Abstract
Reinforcement learning (RL) concerns the problem of a learning agent interacting with its environment to achieve a goal. Instead of being given examples of desired behavior, the learning agent must discover by trial and error how to behave in order to get the most reward. The environment is a Markov decision process (MDP) with state set, $$
\mathcal{S}
$$ , and action set, $$
\mathcal{A}
$$ . The agent and the environment interact in a sequence of discrete steps, t = 0, 1, 2,... The state and action at one time step, $$
s_t  \in \mathcal{S}
$$ and $$
a_t  \in \mathcal{A}
$$ , determine the probability distribution for the state at the next time step, $$
s_{t + 1}  \in \mathcal{S}
$$ and, jointly, the distribution for the next reward, r t+1 ∈ ℜ. The agent’s objective is to chose each aint to maximize the subsequent return:
$$
R_t  = \sum\limits_{k = 0}^\infty  {\gamma ^k r_{t + 1 + k} ,} 
$$
where the discount rate, 0 ≤ γ ≤ 1, determines the relative weighting of immediate and delayed rewards. In some environments, the interaction consists of a sequence of episodes, each starting in a given state and ending upon arrival in a terminal state, terminating the series above. In other cases the interaction is continual, without interruption, and the sum may have an infinite number of terms (in which case we usually assume γ < 1). Infinite horizon cases with γ = 1 are also possible though less common (e.g., see Mahadevan, 1996).

Contact Information Richard S. Sutton
Email: sutton@research.att.com
URL: www.cs.umass.edu/~rich
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.108 • Server: mpweb17
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)