Halmstad 2006
Dynamic programing and Markov chains

Jean-Philippe CHANCELIER

Let $(X_{n})_{n \in {\mathbb{N}}}$ be a finite Markov chain with transition matrix $M^{(n)}$ . We want to compute :

$\displaystyle u_n(x)=\sup_{\tau \;{\cal F}_n t.a., n \leq \tau \leq N} {\mathbb... ...eft[ \sum_{k=n}^{\tau-1} c^{(k)}(X^k)+\psi^\tau ( X^\tau) \vert X^n=x \right].$

The control is a stopping time $n \le \tau \le N$ . As shown in the course the value function can be recursively computed by :

$\displaystyle \left\{ \begin{array}{ll} u_{n}(x) &= \max(M^{(n)} u_{n+1} +c^{(n)}, \psi^{n}), \; \; n=0,\ldots, N-1 u_N &= \psi^N. \end{array} \right.$

(1)

And the optimal stopping time :

$\displaystyle \tau_n = \inf \{ N \ge k \geq n, u_k(X^k) = \psi^k(X^k) \}$

is such that :

$\displaystyle u_n(X^n)= {\mathbb{E}} \left[ \sum_{k=n}^{\tau_n-1} c^{(k)}(X^k)+\psi^{\tau_n}(X^{\tau_n})\vert X^n \right].$

Question 1 Write a first routine which returns a stochastic matrix of size NxN.

Question 2 Using the choosen matrix generate ans plot some samples of the homogeneous Markov chain with states in described by (using grand in Scilab).

Question 3 Choosing an instantaneous cost and a final cost recusively compute the value function and draw the result as a surface (i.e ). Compute also which for a given state at time returns if the optimal strategy is to go on and is the optimal strategy is to stop. Note that the max function can be used to get the maximum value of two quantities but also to get the indice of the value which realize the max.

// A stoping time problem. 
c=... // instantaneous cost 
K=... // final cost 

r=0.05;
C=ones(N,T); // a matrix to store the value function 
U=2*ones(N,T); // a matrix to store the control (1 or 2) 

C(:,T) = ... ; 

for i=T-1:-1:1 
  .... 
  C(:,i)= Ci;
  U(:,i)= ki;
end

plot3d(1:N,1:T,C)

Question 4 Using samples of the markov chain for a given starting state evaluate by Monte Carlo the cost function and compare the results with previous question.

m=10000; // number of samplings 
Cm=0;    // we want to evaluate C(1,1);
for k=1:m  // loop on sampling 
  // the k-th trajectory.
  X=.... // sample a trajectory 
  // the control along the trajectory 
  for i=1:T, u(i)=... end
  stop=... // the stopping time 
  cost=0; 
  ... 
  cost = .. // evaluate the cost for the m-th trajectory
  Cm=Cm+cost/m; // compute the mean 
end

Halmstad 2006
Dynamic programing and Markov chains

Table des matières

1 Stoping time problem

Halmstad 2006 Dynamic programing and Markov chains

Table des matières

1 Stoping time problem

Halmstad 2006
Dynamic programing and Markov chains