Halmstad 2006
Dynamic programing and Markov chains

Jean-Philippe CHANCELIER

Version pdf de ce document
Version sans bandeaux

Table des matières

1 Stoping time problem in infinite horizon

1 Stoping time problem in infinite horizon

Let $(X_{n})_{n \in {\mathbb{N}}}$ be a finite homogeneous Markov chain with transition matrix . We want to compute :

$\displaystyle v(x)=\sup_{\tau \mbox{s.t}} {\mathbb{E}} \left[ \frac{1}{(1+r)^{\tau+1}} f( X_\tau) \vert X_0=x \right].$

the value function can be recursively computed by here can be computed as a fixed point

$\displaystyle v(x) = \frac{1}{1+r} \max( M v(x), f(x))$

(1)

In order to numerically solve such a problem we will use a value iteration method and a policy iteration method.

we will use on of the two following routines to generate a stochastic matrix.

function M= Mtrans(n)
  M=rand(n,n); 
  s=sum(M,'c'); 
  M= M./(s*ones(1,n)); 
endfunction

function M= Mprom(n,d) 
  A= 2*diag(ones(1,n)) -diag(ones(1,n-1),1) -diag(ones(1,n-1),-1);
  A(1,2) = 2*A(1,2) ;
  A($,$-1)=2*  A($,$-1);
  M = eye(A) - A*d
endfunction

Question 1 Using one of the previous function, choose a state space dimension, generate a transition matrix and write a function for f. build a column vector giving the states of the Markov chain.

Question 2 The value iteration méthod to solve the fixed point problem work as follows. Choose an initial vector v0=v which gives for each of the n possible states. Then iterate $v_{n+1}=\frac{1}{1+r} \max( M v_n, f)$ . Stop the alogorithm when the infinite norm of $v_{n+1}-v_{n}$ is small enough. Then, plot the evolution of norm $(v_{n+1}-v_{n})$ during the evolution of the algorithm. Run the same problem with different values of .

Question 3 The policy iteration algorithm evolves as follow. Choose a first value for v0=v. Then given compute a new policy, $u_n{x} =$ 1 if $Mv_n(x) \ge f(x)$ and $u_n{x} =$ 2 elsewhere. Given a new policy , the value function $v_{n+1}(x)$ is computed as follows, whe must have $v_{n+1}(x)= Mv(x)/(1+r)$ for all the states for which the policy is equal to and $v_{n+1}(x)=f(x)$ for all the states for which the policy is equal to . Thus we just have to write and solve a linear system to compute $v_{n+1}$ . As for the value iteration, Stop the alogorithm when the infinite norm of $v_{n+1}-v_{n}$ is small enough. Then, plot the evolution of norm $(v_{n+1}-v_{n})$ during the evolution of the algorithm and compare the two algorithms. Run the same problem with different values of .

Halmstad 2006 Dynamic programing and Markov chains

Table des matières

1 Stoping time problem in infinite horizon

Halmstad 2006
Dynamic programing and Markov chains