Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Propagation of Q-values in Tabular TD(λ)

Philippe PreuxContact Information

(2)  Laboratoire d’Informatique du Littoral, UPRES-EA 2335, Université du Littoral Cote d’Opale, BP 719, 62228 Calais Cedex, France
Abstract
In this paper, we propose a new idea for tabular TD(λ) algorithm. In TD learning, rewards are propagated along the sequence of state/action pairs that have been visited recently. In complement to this, we propose to propagate rewards towards neighboring state/action pairs along this sequence, though unvisited. This leads to a great decrease in the number of iterations required for TD(λ) to be able to generalize since it is no longer necessary that a state/action pair is visited for its Q-value to be updated. The use of this propagation process makes tabular TD(λ) coming closer to neural net based TD(λ) with regards to its ability to generalize, while keeping unchanged other properties of tabular TD(λ).

Contact Information Philippe Preux
Email: philippe.preux@lil.univ-littoral.fr
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.108 • Server: mpweb20
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)