ASK HERE

seminar paper · 13-03-2012, 05:05 PM

Approximate dynamic programming with a fuzzy parameterization

[attachment=18299]

Introduction
Dynamic programming (DP) is a powerful paradigm for solving
optimal control problems, thanks to its mild assumptions
on the controlled process, which can be nonlinear or stochastic
(Bertsekas, 2007; Bertsekas & Tsitsiklis, 1996). In the DP framework,
a model of the process is assumed to be available, and the
immediate performance is measured by a scalar reward signal. The
controller then maximizes the long-term performance, measured
by the cumulative reward. DP algorithms can be extended to work
without requiring a model of the process, in which case they are
usually called reinforcement learning (RL) algorithms (Sutton &
Barto, 1998).

Markov decision processes and Q-iteration
This section introduces deterministic Markov decision processes
(MDPs) and characterizes their optimal solution (Bertsekas,
2007; Sutton & Barto, 1998). Afterwards, exact and approximate
Q-iteration are presented.
A deterministic MDP consists of the state space X, the action
space U, the transition function f : X × U → X, and the reward
function ρ : X × U → R. As a result of the control action uk
applied in the state xk, the state changes to xk+1 = f (xk, uk) and
a scalar reward rk+1 = ρ(xk, uk) is generated, which evaluates the
immediate effect of action uk (the transition from xk to xk+1). The
state and action spaces can be continuous or discrete. We assume
that ρ∞ = supx,u |ρ(x, u)| is finite. Actions are chosen according
to the policy h : X → U, which is a discrete-time state feedback
uk = h(xk).
The goal is to find an optimal policy, i.e., one that maximizes,
starting from the current moment in time (k = 0) and from any
initial state x0, the discounted return:

4. Fuzzy Q-iteration
In this section, the fuzzy Q-iteration algorithm is introduced.
First, the fuzzy approximation and projection mappings are
described, followed by synchronous and asynchronous fuzzy Qiteration.
The state space X and the action space U of the MDP may
be either continuous or discrete, but they are assumed to be subsets
of Euclidean spaces, such that the 2-norm of the states and actions
is well-defined.

Analysis of fuzzy Q-iteration
In Section 5.1, we show that synchronous and asynchronous
fuzzy Q-iteration are convergent and we characterize the suboptimality
of their solution. In Section 5.2, we show that fuzzy
Q-iteration is consistent, i.e., that its solution asymptotically converges
to Q∗ as the approximation accuracy increases. These
results show that fuzzy Q-iteration is a theoretically sound algorithm.
Section 5.3 examines the computational complexity of fuzzy
Q-iteration.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	AUTOMATED CAR BRAKING SYSTEM USING FUZZY LOGIC CONTROLLER	project uploader	3	3,824	15-05-2013, 09:52 AM Last Post: computer topic
	DYNAMIC LANGUAGE	seminar addict	2	1,984	03-01-2013, 12:30 PM Last Post: seminar details
	Microprocessor Microcontroller and Programming Basics	seminar addict	1	2,245	03-12-2012, 12:57 PM Last Post: seminar details
	Economic dispatch using fuzzy logic	seminar paper	1	1,374	16-11-2012, 12:25 PM Last Post: seminar details
	PROTOTYPING AND DYNAMIC ANALYSIS OF ROTOR SHAFT AND HUB	seminar details	0	416	08-06-2012, 05:13 PM Last Post: seminar details
	NC and CNC machines and Control Programming ppt	seminar details	0	2,117	08-06-2012, 12:53 PM Last Post: seminar details
	Fuzzy c-Means Clustering of Incomplete Data	seminar details	0	958	07-06-2012, 01:44 PM Last Post: seminar details
	Visual Basic Programming	seminar details	0	943	05-06-2012, 01:52 PM Last Post: seminar details
	Secure and Practical Outsourcing of Linear Programming in Cloud Computing full report	seminar details	0	868	04-06-2012, 05:42 PM Last Post: seminar details
	Brief Introduction to the C Programming Language	seminar paper	0	1,000	15-03-2012, 02:14 PM Last Post: seminar paper

Important Note..!

ASK HERE