Trial-based heuristic tree search for finite horizon MDPs

Keller, Thomas and Helmert, Malte. (2013) Trial-based heuristic tree search for finite horizon MDPs. In: Proceedings of the 23rd International Conference on Automated Planning and Scheduling (ICAPS 2013) : [held 10 - 14 June 2013 in Rome, Italy]. Palo Alto, Calif., pp. 135-143.

PDF - Published Version

Official URL: http://edoc.unibas.ch/dok/A6212201

Downloads: Statistics Overview


Dynamic programming is a well-known approach for solving MDPs. In large state spaces, asynchronous versions like Real-Time Dynamic Programming have been applied successfully. If unfolded into equivalent trees, Monte-Carlo Tree Search algorithms are a valid alternative. UCT, the most popular representative, obtains good anytime behavior by guiding the search towards promising areas of the search tree. The Heuristic Search algorithm AO∗ finds optimal solutions for MDPs that can be represented as acyclic AND/OR graphs. We introduce a common framework, Trial-based Heuristic Tree Search, that subsumes these approaches and distinguishes them based on five ingredients: heuristic function, backup function, action selection, outcome selection, and trial length. Using this framework, we describe three new algorithms which mix these ingredients in novel ways in an attempt to combine their different strengths. Our evaluation shows that two of our algorithms not only provide superior theoretical properties to UCT, but also outperform state-of-the-art approaches experimentally.
Faculties and Departments:05 Faculty of Science > Departement Mathematik und Informatik > Informatik > Artificial Intelligence (Helmert)
UniBasel Contributors:Helmert, Malte
Item Type:Conference or Workshop Item, refereed
Conference or workshop item Subtype:Conference Paper
Publisher:AAAI Press
Note:Publication type according to Uni Basel Research Database: Conference paper
Related URLs:
edoc DOI:
Last Modified:19 Nov 2018 15:24
Deposited On:23 May 2014 08:34

Repository Staff Only: item control page