## Ciamac Moallemi

*Universal reinforcement learning*

Coauthor(s): Vivek Farias, Benjamin Van Roy, Tsachy Weissman.

#### Download:

Adobe Acrobat PDF

**Abstract:**

We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the active LZ algorithm, for optimal control based on ideas from the Lempel-Ziv scheme for universal data compression and prediction. We establish that, under the active LZ algorithm, if there exists an integer *K* such that the future is conditionally independent of the past given a window of *K* consecutive actions and observations, then the average cost converges to the optimum. Experimental results involving the game of Rock-Paper-Scissors illustrate merits of the algorithm.

**Source:** *IEEE Transactions on Information Theory*

**Exact Citation:**

Farias, Vivek, Ciamac Moallemi, Benjamin Van Roy, and Tsachy Weissman. "Universal reinforcement learning." *IEEE Transactions on Information Theory* 56, no. 5 (May 2010): 2441-2454.

**Volume:** 56

**Number:** 5

**Pages:** 2441-2454

**Date:**
5
2010