Halmstad 2006
Dynamic programing and Markov chains

Jean-Philippe CHANCELIER
Version pdf de ce document
Version sans bandeaux

Table des matières

1 Stoping time problem in infinite horizon

Let $ (X_{n})_{n \in {\mathbb{N}}}$ be a finite homogeneous Markov chain with transition matrix $ M$. We want to compute :

$\displaystyle v(x)=\sup_{\tau \mbox{s.t}} {\mathbb{E}} \left[ \frac{1}{(1+r)^{\tau+1}} f( X_\tau) \vert X_0=x \right].
$

the value function can be recursively computed by here can be computed as a fixed point

$\displaystyle v(x) = \frac{1}{1+r} \max( M v(x), f(x))$ (1)

In order to numerically solve such a problem we will use a value iteration method and a policy iteration method.

we will use on of the two following routines to generate a stochastic matrix.

function M= Mtrans(n)
  M=rand(n,n); 
  s=sum(M,'c'); 
  M= M./(s*ones(1,n)); 
endfunction

function M= Mprom(n,d) 
  A= 2*diag(ones(1,n)) -diag(ones(1,n-1),1) -diag(ones(1,n-1),-1);
  A(1,2) = 2*A(1,2) ;
  A($,$-1)=2*  A($,$-1);
  M = eye(A) - A*d
endfunction

    

Question 1   Using one of the previous function, choose a state space dimension, generate a transition matrix and write a function for f. build a column vector $ xs$ giving the states of the Markov chain.



    

Question 2   The value iteration méthod to solve the fixed point problem work as follows. Choose an initial vector v0=v which gives $ v_0(x)$ for each of the n possible states. Then iterate $ v_{n+1}=\frac{1}{1+r} \max( M v_n, f)$. Stop the alogorithm when the infinite norm of $ v_{n+1}-v_{n}$ is small enough. Then, plot the evolution of norm$ (v_{n+1}-v_{n})$ during the evolution of the algorithm. Run the same problem with different values of $ r$.



    

Question 3   The policy iteration algorithm evolves as follow. Choose a first value for $ v$ v0=v. Then given $ v_n$ compute a new policy, $ u_n{x} =$   1 if $ Mv_n(x) \ge f(x)$ and $ u_n{x} =$   2 elsewhere. Given a new policy $ u_n$, the value function $ v_{n+1}(x)$ is computed as follows, whe must have $ v_{n+1}(x)= Mv(x)/(1+r)$ for all the states for which the $ u_n$ policy is equal to $ 1$ and $ v_{n+1}(x)=f(x)$ for all the states for which the $ u_n$ policy is equal to $ 2$. Thus we just have to write and solve a linear system to compute $ v_{n+1}$. As for the value iteration, Stop the alogorithm when the infinite norm of $ v_{n+1}-v_{n}$ is small enough. Then, plot the evolution of norm$ (v_{n+1}-v_{n})$ during the evolution of the algorithm and compare the two algorithms. Run the same problem with different values of $ r$.