Wyniki wyszukiwania

1

Estimation and control in finite Markov decision processes with the average reward criterion

100%

Cavazos-Cadena R., Montes-de-Oca R.

Applicationes Mathematicae

|

2004

|

tom 31

|

nr 2

127-154

EN

This work concerns Markov decision chains with finite state and action sets. The transition law satisfies the simultaneous Doeblin condition but is unknown to the controller, and the problem of determining an optimal adaptive policy with respect to the average reward criterion is addressed. A subset of policies is identified so that, when the system evolves under a policy in that class, the frequency estimators of the transition law are consistent on an essential set of admissible state-action pairs, and the non-stationary value iteration scheme is used to select an optimal adaptive policy within that family.

2

Estimates for perturbations of discounted Markov chains on general spaces

81%

Montes-de-Oca R., Sakhanenko A., Salem-Silva F.

Applicationes Mathematicae

|

2003

|

tom 30

|

nr 1

39-53

EN

We analyse a Markov chain and perturbations of the transition probability and the one-step cost function (possibly unbounded) defined on it. Under certain conditions, of Lyapunov and Harris type, we obtain new estimates of the effects of such perturbations via an index of perturbations, defined as the difference of the total expected discounted costs between the original Markov chain and the perturbed one. We provide an example which illustrates our analysis.

3

Estimates for perturbations of general discounted Markov control chains

81%

Montes-de-Oca R., Sakhanenko A., Salem-Silva F.

Applicationes Mathematicae

|

2003

|

tom 30

|

nr 3

287-304

EN

We extend previous results of the same authors ([11]) on the effects of perturbation in the transition probability of a Markov cost chain for discounted Markov control processes. Supposing valid, for each stationary policy, conditions of Lyapunov and Harris type, we get upper bounds for the index of perturbations, defined as the difference of the total expected discounted costs for the original Markov control process and the perturbed one. We present examples that satisfy our conditions.

Ograniczanie wyników

3 Applicationes Mathematicae

3 Montes-de-Oca R.

2 Sakhanenko A.

2 Salem-Silva F.

1 Cavazos-Cadena R.

1 2004

2 2003

Estimation and control in finite Markov decision processes with the average reward criterion

Estimates for perturbations of discounted Markov chains on general spaces

Estimates for perturbations of general discounted Markov control chains