13-03-2012, 05:05 PM
Approximate dynamic programming with a fuzzy parameterization
[attachment=18299]
Introduction
Dynamic programming (DP) is a powerful paradigm for solving
optimal control problems, thanks to its mild assumptions
on the controlled process, which can be nonlinear or stochastic
(Bertsekas, 2007; Bertsekas & Tsitsiklis, 1996). In the DP framework,
a model of the process is assumed to be available, and the
immediate performance is measured by a scalar reward signal. The
controller then maximizes the long-term performance, measured
by the cumulative reward. DP algorithms can be extended to work
without requiring a model of the process, in which case they are
usually called reinforcement learning (RL) algorithms (Sutton &
Barto, 1998).
Markov decision processes and Q-iteration
This section introduces deterministic Markov decision processes
(MDPs) and characterizes their optimal solution (Bertsekas,
2007; Sutton & Barto, 1998). Afterwards, exact and approximate
Q-iteration are presented.
A deterministic MDP consists of the state space X, the action
space U, the transition function f : X × U → X, and the reward
function ρ : X × U → R. As a result of the control action uk
applied in the state xk, the state changes to xk+1 = f (xk, uk) and
a scalar reward rk+1 = ρ(xk, uk) is generated, which evaluates the
immediate effect of action uk (the transition from xk to xk+1). The
state and action spaces can be continuous or discrete. We assume
that ρ∞ = supx,u |ρ(x, u)| is finite. Actions are chosen according
to the policy h : X → U, which is a discrete-time state feedback
uk = h(xk).
The goal is to find an optimal policy, i.e., one that maximizes,
starting from the current moment in time (k = 0) and from any
initial state x0, the discounted return:
4. Fuzzy Q-iteration
In this section, the fuzzy Q-iteration algorithm is introduced.
First, the fuzzy approximation and projection mappings are
described, followed by synchronous and asynchronous fuzzy Qiteration.
The state space X and the action space U of the MDP may
be either continuous or discrete, but they are assumed to be subsets
of Euclidean spaces, such that the 2-norm of the states and actions
is well-defined.
Analysis of fuzzy Q-iteration
In Section 5.1, we show that synchronous and asynchronous
fuzzy Q-iteration are convergent and we characterize the suboptimality
of their solution. In Section 5.2, we show that fuzzy
Q-iteration is consistent, i.e., that its solution asymptotically converges
to Q∗ as the approximation accuracy increases. These
results show that fuzzy Q-iteration is a theoretically sound algorithm.
Section 5.3 examines the computational complexity of fuzzy
Q-iteration.