Presentation of the Stochastic Dynamic Dual Programming Algorithm

Adrien Cassegrain, Vincent Leclère
(last modiﬁcation date: October 10, 2017)

Version pdf de ce document
Version sans bandeaux

1 Preliminary
2 The Structure of SDDP
3 Problem data
4 Dam optimal management

1 Preliminary
1.1 Piecewise linear functions
1.2 The cutting plane method: Kelley algorithm
2 The Structure of SDDP
2.1 The Deterministic case
  2.1.1 Problem statement
  2.1.2 Bellman operator
  2.1.3 Duality property
  2.1.4 SDDP algorithm (deterministic case)
  2.1.5 Initialization and stopping rule
2.2 The Stochastic case
  2.2.1 Problem statement
  2.2.2 Bellman operator
  2.2.3 Duality theory
  2.2.4 SDDP algorithm (stochastic case)
  2.2.5 Initialization and stopping rule
  2.2.6 A few other possibilities
3 Problem data
3.1 Dam dynamics
3.2 Criterion: intertemporal cost
3.3 Probabilistic model for water inﬂows
3.4 Numerical data
3.5 Water turbined policy
3.6 Minimization under constraints
4 Dam optimal management
4.1 DP resolution
4.2 SDDP resolution
4.3 Comparison of SDDP and DP

This practical work is aimed at presenting the theory of the SDDP algorithm and apply it on a simple dam management problem.

The Stochastic Dual Dynamic Programming (SDDP) algorithm is close to the Dynamic Programming method. The main diﬀerence being that, instead of computing the Bellman function at every point, the algorithm construct a polyhedral approximation of the Bellman function. More precisely the algorithm construct a piecewise linear function that is a lower approximation of the exact Bellman function.

This document is given in three parts:

In §1, we present a few results about piecewise linear functions. We also recall the cutting plane algorithm, known as Kelley’s algorithm.
In §2, we present the SDDP algorithm.
In §3, we present the dam management problem we want to tackle.
Finally, in §4, we apply the SDDP algorithm on the dam management problem.

Before going any further you should download the following code:

1 Preliminary

1.1 Piecewise linear functions

Recall that the supremum of a ﬁnite number of hyperplane is a convex piecewise linear function. We consider the function

{ f : ℝn → ℝ, f (x ) := max (ai ⋅ x + bi), i=1...k

(1)

where b_i ∈ ℝ and a_i ∈ ℝⁿ for i ∈{1, ⋅⋅⋅ ,k}.

Question 1 Find a new representation of f as a linear programm. That is f(x) is the value of a minimization problem with linear cost and linear equality or inequality constraints.

1.2 The cutting plane method: Kelley algorithm

The Kelley algorithm enables someone to approximate the minimum of a convex function f on a compact C only by having an oracle able to compute at each point its value and derivative. Recall that a convex function is always above its tangents. The cutting plane method consists in constructing a lower approximation of the objective function f as the maximum of tangents. This lower approximation is a piecewise-linear function, and is improved throughout the algorithm.

Let describe more precisely the algorithm. We start at a given point x₀. The ﬁrst lower approximation is the tangent at this point,

fˇ1(x ) := T (x ). 0 1 0

Then, at each step, we computes the equation of the tangent of f at the minimum of the approximate function ˇ
f ,

{ xk ∈ argmin fˇk−1(x), x∈C Tk (x) := f′(xk) ⋅ (x − xk ) + f(xk ).

(2)

The new lower approximation ^k+1 is given by the maximum of the old one and the tangent,

( ) fˇk+1 (x ) := max fˇk(x ),Tk (x) .

(3)

Question 2

Run the algorithm in the ﬁle Kelley.sce that picture the initial function and the cutting planes found on the same graph. Evaluate graphically the diﬀerence between the minimum found and the real one.

Test the algorithm for diﬀerent convex function and for an increasing number of iteration. Comments.

2 The Structure of SDDP

In this section we present the Stochastic Dual Dynamic Programming (SDDP) algorithm. More precisely we present the so called DOASA implementation. For clarity reason we present it in the deterministic case in a ﬁrst place, and then go to the stochastic case.

2.1 The Deterministic case

The SDDP algorithm in a deterministic setting is really close to the Kelley’s algorithm on a multistage problem instead of a static problem.

2.1.1 Problem statement

We consider a controled dynamic system in discrete time that follows the equation

∀t ∈ {0, ...,T − 1}, xt+1 = ft(xt,ut),

where x_t ∈ 𝕏 is the state at time t of the system and u_t ∈ 𝕌 the control applied at time t; f_t is the dynamics of the system. We assume that the sets 𝕌 and 𝕏 are compact subset of a ﬁnite dimensionnal vector space. We also assume that the functions f_t are linear. At each time t there is a convex cost L_t(x_t,u_t) that depends on the current state and the control choosen. Moreover there is a ﬁnal cost K(x_T) that depends on the ﬁnal state of the system. A policy is a sequence of functions π = (π₁,…,π_T−1) giving for any state x a control u.

We consider the following problem

T− 1 ∑ mui∈n𝕌T Lt(xt,ut) + K (xT ), t=0 s.t. xt+1 = ft(xt,ut).

(4)

This problem can be solved by dynamic programming.

Question 3 Write the Dynamic Programming equation relative to this problem. How to compute the Bellman Value function ? How to compute the optimal policy ?

2.1.2 Bellman operator

In order to understand the SDDP algorithm it is useful to introduce the Bellman operator 𝒯_t of the problem.

For any time t, and any function A mapping the set of states into ℝ we deﬁne:

∀x ∈ 𝕏, 𝒯t(A )(x) = muit∈n𝕌 Lt(x,ut) + A ∘ ft(x,ut).

(5)

Question 4 Write the Dynamic Programming equation with the Bellman operator.

Let study a few properties of the Bellman operator.

Monotonicity For any couple of functions (V,V ), we have

∀x ∈ 𝕏, V (x) ≤ V-(x) ⇒ ∀x ∈ 𝕏, (𝒯 V)(x) ≤ (𝒯 V-)(x).

(6)

Convexity For any function V , if L_t is jointly convex in (x,u), V is convex, and f_t is aﬃne then

x ↦→ (𝒯 V )(x) is convex.

(7)

Linearity For any piecewise linear function $V$ , if $Lt$ is also piecewise linear, and f_t aﬃne, then

x ↦→ (𝒯 V )(x ) is piecewise linear.

(8)

Question 5 Prove these three results.

2.1.3 Duality property

Let J be a function mapping 𝕏 × 𝕌 into ℝ. Consider the function φ deﬁned as

φ (x ) = mui∈n𝕌 J(x,u ),

where J is jointly convex in (x,u). Then the interpretation of the multiplier as a marginal cost implies that a subgradient λ ∈ ∂φ(x₀) of φ at x₀ is the dual multiplier of constraint x₀ − x = 0 in the following problem

min J(x,u), x,u s.t. x0 − x = 0.

In particular it means that, as φ is convex

∀x ∈ 𝕏, φ(x) ≥ φ (x0 ) + ⟨λ,x − x0 ⟩.

(9)

2.1.4 SDDP algorithm (deterministic case)

As in the Kelley’s algorithm, the SDDP algorithm recursively constructs a lower-approximation _t^(k) of each Bellman function V _t as the supremum of a number of aﬃne functions. Each of this aﬃne functions is called a cut.

Stage k of the SDDP algorithm goes as follows.

Suppose that we have a collection of T lower approximation of the Bellman functions denoted _t^(k) that is a maximum of aﬃne functions. In order to improve our approximation we follow an optimal trajectory (x_t^(k))_t of the approximated problem and add a cut computed along the optimal trajectory for each Bellman function.

Thus, we begin with a loop forward in time by setting t = 0 and x_t^(k) = x₀, and solve

min Lt (x, u) + ˇV(t+k1)∘ ft(x, u), x,u (k) x = xt .

We call β_t^(k+1) the value of the problem, λ_t^(k+1) a multiplier of the constraint x = x_t^(k), and u_t^(k) an optimal solution. This can also be written as

( ( ) ( ) { β (kt+1) = 𝒯t ˇVt(+k)1 x (kt) , (k+1) ( (k)) ( (k)) ( λt ∈ ∂𝒯t Vˇt+1 x t .

Question 6 Show that, by (9), and by monotonicity of 𝒯_t, we obtain

( ) ∀x ∈ 𝕏, β (k+1) + ⟨λ(k+1),x − x (k)⟩ ≤ 𝒯 ˇV(k) (x) ≤ 𝒯 (V )(x) = V (x). t t t t t+1 t t+1 t

(10)

Thus we have by (9), and by monotonicity of 𝒯_t,

( ) ∀x ∈ 𝕏, β (k+1)+ ⟨λ(k+1 ),x − x (k)⟩ ≤ 𝒯t Vˇ(k) (x) ≤ 𝒯t (Vt+1 )(x) = Vt(x). t t t t+1

(11)

This ensures that the function xβ_t^(k+1) + ⟨ (k+1) (k)⟩
λ t ,x − xt is indeed a cut, i.e an aﬃne function below V _t. Consequently we update our approximate of V _t by deﬁning

{ ⟨ ⟩ } ˇVt(k+1) = max ˇVt(k),β(tk+1)+ λ(tk+1),⋅ − x(tk) .

Furthermore we can see that ˇ
V _t^(k+1) is convex and lower than V _t. We go to the next time step by setting

( ) x(tk+)1 = ft x(tk),u(tk) .

Upon reaching time t = T we have completed iteration k of the algorithm.

2.1.5 Initialization and stopping rule

In order to initialize the algorithm it seems that we need a lower bound to all value function. According to our assumptions this bound always exist but is not necessary in order to implement the algorithm. Indeed we can choose V _t⁽⁰⁾ = 0 in order to compute the trajectories, and simply set V _t⁽¹⁾ equal to the ﬁrst cut, which means that we “forget” V ⁽⁰⁾ in the maximum that determine V _t⁽¹⁾.

Note that at any step k of the algorithm we have a (non optimal) solution (u^(k))_t of Problem (4), and we can estimate the quality of the solution. Indeed V ₀^(k)(x₀) is a lower bound to the optimal value of Problem (4). Moreover the simulated cost following the solution given by

T∑−1 ( ) ( ) Lt x(kt),u(tk) + K x (kT) , t=0

is an upper bound to the value of Problem (4). A reasonable stopping rule for the algorithm is given by checking that the (relative) diﬀerence of the upper and lower bound is smaller than some constant as they are both asymptotically exact bounds, i.e. both converges toward the value of Problem (4).

2.2 The Stochastic case

Now we introduce some random variables in our problem. This complexify the algorithm in diﬀerent ways:

we need some probabilistic assumptions;
for each stage k we need to do a forward phase that yields a trajectory (x_t^(k))_t, and a backward phase that gives a new cut;
we cannot compute an exact upper bound for the problem’s value.

2.2.1 Problem statement

As in the deterministic case we consider a controled stochastic dynamic system in discrete time that follows the equation

∀t ∈ {0, ...,T − 1}, xt+1 = ft(xt,ut,Wt ),

where x_t ∈ 𝕏 is the state at time t of the system, u_t the control applied at time t, and W_t an exogeneous discrete noise ; f_t is the dynamic of the system.

We assume that the stochastic process (W_t)_{t=1,…,T−1} is independent in time. Moreover we assume that at any time t we know the whole past of the noise process up to, and including, time t. The independence assumption ensures that we can only consider control u_t measurable with respect to (x_t,W_t). This is the so called hazard-decision information structure where we know at time t the realization of the noise before choosing our control.

At each time t there is a jointly convex in (x_t,u_t) cost, L_t(x_t,u_t,W_t) that depends on the current state and the control choosen ; and there is a ﬁnal cost K(x_T) the depends on the ﬁnal state of the system. A policy is a sequence of measurable functions π = (π₁,…,π_T−1) giving for any state x, and realization of the noise w a control u.

We consider the following problem

( ) T∑−1 min 𝔼 Lt (xt,ut,Wt ) + K (xT) , π t=0 s.t. xt+1 = ft(xt,ut,Wt), u = π (x ,W ). t t t t

(12)

2.2.2 Bellman operator

As in the deterministic case we also introduce the stochastic Bellman operator 𝒯_t of the problem. For any time t, and any function A mapping the set of states and noises 𝕏 × 𝕎 into ℝ we deﬁne:

∀x ∈ 𝕏, ^𝒯t(A )(x,w ) = min Lt(x,ut,w ) + A ∘ ft(x,ut,w ). ut∈𝕌

The stochastic Bellman operator 𝒯_t, is given by

∀x ∈ 𝕏, 𝒯t(A )(x ) = 𝔼(^𝒯t(A)(x,Wt )).

Question 7 Write the Dynamic Programming used to solve this problem in two ways: directly, and with the help of the Bellman operator.

Question 8 Show that the Bellman operator have the same properties as in the deterministic case.

2.2.3 Duality theory

Suppose that for an iteration k and a time step t + 1 we have a lower estimation V _t+1^k+1 of V _t+1. We consider, for any w, the problem

min Lt(x,u,w ) + ˇVt(k++11)∘ ft(x,u,w ), x,u s.t x = x(k), t

(13)

and call _t^(k+1)(w) the optimal value, and _t^(k+1)(w) an optimal multiplier of the constraint. As in the deterministic case we have

({ ^(k+1) ^ ( ˇ(k)) (k) β t (w ) = 𝒯t V(t+1 (x)t ,w), ( ^λ(k+1)(w ) ∈ ∂ ^𝒯 Vˇ(k) (x(k),w), t x t t+1 t

(14)

Question 9 Show that we have

⟨ ⟩ ( ) ∀x ∈ 𝕏, ∀ω, ^β(k+1)(w) + ^λ (k+1)(w),x − x (k) ≤ 𝒯^t ˇV (k) (x,w ) ≤ ^Vt(x, w). t t t t+1

(15)

By deﬁning

show that we have

⟨ ⟩ β (kt+1) + λ (kt+1),⋅ − x (kt) ≤ Vt.

(16)

2.2.4 SDDP algorithm (stochastic case)

At the beginning of step k we suppose that we have, for each time step t an approximation _t^(k) of V _t verifying

_t^(k) ≤ V _t,
V _T^k = K,
_t^(k) is convex.

Forward path We randomly select a scenario (w₀,…,w_T−1) ∈ 𝕎^T. We recursively, by a forward loop in time, deﬁne a trajectory (x_t^(k))_t=0,…,T by x_t+1^(k) = f_t(x_t^(k),u_t^(k),w_t), where u_t^(k) is an optimal solution of

( ) ( ) min Lt x (kt),u,wt + ˇV(tk+)1 ∘ ft x (kt),u,wt . u∈𝕌

(17)

This trajectory is a trajectory given by the optimal policy of Problem (12) where V _t is replaced by its current approximation _t^(k).

Backward path We reader improve the approximation of V _t by adding a cut at V _t(x_t^(k)). We procede backward in time by setting t = T − 1, and solve Problem (13) for every w in the support of W_t. We obtain ^
λ _t^(k+1)(w) and ^
β _t^(k+1)(w). As W_t is discrete we can compute exactely the expectation λ_t^(k+1) and β_t^(k+1), and deﬁne the new lower approximation

(k+1) { (k) (k+1) ⟨ (k+1) (k)⟩} ∀x ∈ 𝕏, Vˇt (x) = max Vˇt (x ),β t + λ t ,x − xt

Question 10 Check that _t^(k+1) is a lower convex approximation of V _t.

Go one step back in time: t ← t − 1. Upon reaching t = 0 we have completed step k of the algorithm.

Question 11 For one iteration of the SDDP, algorithm, how many one-step one-alea problem (Problem (17)) do we need to solve ?

2.2.5 Initialization and stopping rule

We can intialize as suggested in the deterministic case. However in order to accelerate the convergence it can be useful to bypass a few forward paths by abritrarily choosing some trajectories (x_t^(k))_t.

As in the deterministic case we have an exact lower bound of Problem (12) given by V ₀^(k)(x₀). However, determining an upper bound is more diﬃcult. Indeed, the upper bound is given as the expectation of the cost induced by the optimal strategy of the approximated problem. This expectation is to be taken on the whole process (W₀,…,W_T−1) and cannot be computed exactly. Consequently this expectation is estimated by Monte-Carlo methods, and we have no control over the error of our solution.

A heuristic stopping rule consist in stopping the algorithm if the lower bound is in the conﬁdence interval of the upper bound for a determined number of Monte-Carlo simulation.

2.2.6 A few other possibilities

In the algorithm presented here, also called DOASA, we select one scenario (one realization of (W₁,…,W_T−1)) to do a forward and backward path. It is also possible to select a number N of scenarios to do the forward path (computation can be parallelized). Then during the backward path we add N cuts to V _t before computing the cuts on V _t−1. This is the original implementation of the algorithm.

Otherwise the CUPPS algorithm suggest to use _t+1^(k) instead of V _t+1^(k+1) in the computation of the cuts (14). In practice it means that we do only a forward path at step k:

select randomly a scenario (w_t)_{t=0,…,T−1};
at time t we have a state x_t^(k), we solve problem (13) for every possible realization of W_t in order to compute the new cut for V _t;
choose the optimal control corresponding to the realization W_t = w_t in order to compute the state x_t+1^(k) where the cut for V _t+1 will be computed, and goes to the next step.

Moreover it is possible to select more than one scenario, and everything can be parralelized in this implementation.

Finally it is always possible, and most of the time fruitful to compute some cuts before starting the algorithm. One way of doing this is to bypass the forward phase by choosing the trajectory (x_t^(k))_t=0,…,T instead of computing it as some optimal trajectory.

3 Problem data

We consider a dam manager intenting to minimize the intertemporal cost obtained by turbining the water in a dam, when the water inﬂow (rain, outﬂows from upper dams) is supposed to be stochastic.

3.1 Dam dynamics

S◟t◝+◜1◞ = ◟S◝t◜◞ − ◟q◝t◜◞ − ◟δ◝t◜◞+ ◟a◝t◜◞ , t ∈ 𝕋 := {t0,...,T − 1} future volume volume turbined spilled inflow

with

time t ∈𝕋 := {t₀,…,T} is discrete (days), and t denotes the beginning of the period [t,t + 1[,
S_t volume (stock) of water at the beginning of period [t,t + 1[, belonging to the set $𝕏 = [0,S ♯]$ , made of water volumes (h m³), where $S ♯$ is the maximum dam volume,
a_t inﬂow water volume (rain, etc.) during [t,t + 1[, belonging to $𝕎 =$ ${0,1,2,...,a ♯}$ , and supposed to be known at the beginning of the period (Hazard-Decision),
q_t turbined outﬂow volume during [t,t+1[, decided at the beginning of period [t,t+1[, belonging to the set $𝕌 = [0,q♯]$ , where q^♯ is the maximum which can be turbined by time unit (and produce electricity),
δ_t spilled outﬂow volume during [t,t + 1[, decided at the beginning of period [t,t + 1[, belonging to the set $𝔻 = ℝ+$ , and such that
$0 ≤ qt + δt ≤ St + at.$ (18)

The dam manager is supposed to make a decision, here turbining q_t and spilling δ_t, at the beginning of the period [t,t + 1], after knowing the water inﬂow a_t.

A water inﬂows scenario is a sequence as

a := (at0,...,aT −1).

(19)

Compared to other methods, we have added the spilled outﬂow volume δ in order to get a linear dynamics and we are placed in the Hazard-Decision case. These two points are mandatory to apply the SDDP algorithm.

3.2 Criterion: intertemporal cost

The manager original problem is one of cost minimization where turbining one unit of water has unitary cost −p_t. On the period from t₀ to T, the costs sum up to

T−1 ∑ − p q + K (S ), tt T t=t0

(20)

where

a scenario of costs
$− p(⋅) = ( − pt0,...,− pT −1)$ (21)

is given (data of the problem) and supposed to be known in advance (deterministic setting),
the ﬁnal term $K (ST )$ gives value to the water volume in the dam at the horizon T.

3.3 Probabilistic model for water inﬂows

We suppose that sequences of uncertainties (a_t₀,…,a_T−1) are random variables with a known probability distribution ℙ on the set {0,…,a^♯}, with ℙ{a_t = a} given.

Notice that the random variables (a_t₀,…,a_T−1) are independent, but that they are not necessarily identically distributed. This allows us to account for seasonal eﬀects (more rain in autumn and winter).

3.4 Numerical data

We now perform numerical simulations, and try diﬀerent policies.

Time. We consider a daily management over one year

t0 = 1 and T = 365,

(22)

Bounds. Concerning the dam and water inﬂows, we consider the following bounds:

0.4 0.5 S0 = 0 hm3, S ♯ = 100 hm3, q♯ = ---× S♯ and a♯ = ---× S♯. 7 7

(23)

These ﬁgures reﬂect the assumptions that, during one week, one can release at maximum 40% of the dam volume, and that during one week of full water inﬂows, an empty dam can be half-ﬁlled.

Costs scenario. The scenario $− p(⋅) = ( − pt0,...,− pT− 1)$ of costs is known in advance. We produce it by one sample from the expression

--- --- 3 3 − pt = (1 + 𝜀(t))− p with − p = − 66MWh ∕hm × 2.7 euros∕MWh

(24)

where $𝜀(t)$ is drawn from a sequence of i.i.d. uniform random variables in $[− 1∕2, 1∕2]$ .

Water inﬂow scenarios. We produce scenarios a(⋅) := a_t₀,…,a_T−1 of water inﬂows by a sequence of independent random draws from the set {0, 1, ⋅⋅⋅ ,a^♯}.

3.5 Water turbined policy

An admissible policy π : 𝕋 × 𝕏 × 𝕎 → 𝕌 × 𝔻 assigns an amount of water turbined and spilled $(q,δ) = πt(S,a )$ to any state S of dam stock volume, to any decision period t ∈ 𝕋 and to any water inﬂow a ∈ 𝕎, while respecting the constraints

( || 0 ≤ qt ≤ q♯, { 0 ≤ δt, | 0 ≤ q + δ ≤ S + a, |( t t t t ♯ 0 ≤ St + at − qt − δt ≤ S

(25)

that can also be written

( || − qt ≤ 0, ||{ qt ≤ q♯, − δt ≤ 0, || ||( qt + δt − St ≤ at,♯ − qt − δt +St ≤ S − at.

(26)

The system (26) is equivalent to

( ) ( ) ( ) − 1 0 0 0 || 1 0 || ( ) || 0 || || q♯ || | 0 − 1 | ⋅ qt + St.| 0 | ≤ | 0 | . |( 1 1 |) δt |( − 1 |) |( a |) ♯ t ◟--−-1◝◜− 1-◞ ◟--◝1◜--◞ ◟-S--−◝◜at--◞ A E b(t)

(27)

In the following, we will simply write as deﬁned A ⋅ ( )
q
δ + S_t.E ≤ b(t).

Once the policy given, we obtain a volume trajectory S(⋅) = (St ,⋅⋅⋅,ST −1,ST )
0 , a turbined trajectory q(⋅) = (q ,⋅⋅⋅,q ,q )
t0 T−1 T and a spilled trajectory δ(⋅) = (δ ,⋅⋅⋅,δ ,δ )
t0 T−1 T produced by the ”closed-loop” dynamics

( { St0 = S0, St+1 = St + at − qt − δt, ( (qt,at) = πt(St,at).

(28)

and function of the scenario a(⋅). Thus, in the end, we obtain the cost

T−1 πt0 ∑ Crit (S0,at0,⋅⋅⋅,a(T)) := − ptqt − K (ST ). t=t0

(29)

3.6 Minimization under constraints

The aim of this part is to compute the minimum of a given convex function under linear constraints and to ﬁnd a cutting plane. We will use this to ﬁnd an optimal strategy.

The problem is

Vtk(z) = min Lt(z,q,δ,at) + V kt+1 ∘ f(z,q,δ,at), q,δ ( ) q s.t. A ⋅ δ + E ⋅ z ≤ b(t)

(30)

In order to use the linear solver, we have to transform this formula into one of the form

min C ⋅ u′, u′ s.t. A ′ ⋅ u + E ′ ⋅ z ≤ b′(t).

(31)

We remind the reader that V _t+1^k is a piecewise linear function written

V k (z) = max {λ ⋅ z + β}, t+1 0≤i≤k i i

which is equivalent to (as seen in § 1.1)

Vtk+1(z) = min γ, γ s.t. γ ≥ λi ⋅ z + βi, ∀i ∈ {1,...,k}

(32)

By writting

$( ) q u = ( δ ) γ$ , $( ) A 0 | − λ1 − λ1 − 1| A ′k = || . . .|| ( .. .. ..) − λk − λk − 1$ , $( ) E | λ1| E ′k = || .|| ( ..) λk$ and $( ) b(t) | − β1 − λ1 ⋅ at| b′k(t) = || . || ( .. ) − βk − λk ⋅ at$ , we obtain the wanted form for the minimization problem (see (31)).

The last step is to ﬁnd the parameters of the cutting plane of the Bellman function V _t^k at a given point z. To do so, we transform the state z in the previous equation in a variable x and add the constraint x = z. The Lagrange multiplier of this equality constraint is the slope of a new cutting plane and the minimum cost is the ordinate at the origin.

The following macro returns the parameters of the plane (λ,β) and the point at which the minimum is reached u_opt given the matricial form of the cost (−p,V ), the constraints (A,b(t),E), the value of the uncertainty a, the dynamics given by (Q,R) and the state z.

Question 12 Explain the objective of the function CutPerAlea. Where, in the SDDP algorithm, is this function required ? How many times is the function called for a step k of the SDDP algorithm ?

4 Dam optimal management

We now solve the problem presented earlier in this practical work, by Dynamic Programming and by Stochastic Dual Dynamic Programming.

Question 13 First of all open the ﬁle damdata.sce, and understand the diﬀerent macro given in this ﬁle, in particular the macro trajectories.

Now deﬁne a policy (a function taking a time t, a current stock s and an inﬂow a and giving an admissible control), and test it. Plot (using plot2d2) the trajectories of the stock with this policy.

4.1 DP resolution

We solve exactly the problem using Dynamic Programming.

Question 14 In the ﬁle sdp.sce a Dynamic Programming approach is presented. Understand how the function works, and plot the optimal stock trajectory using function from damdata.sce.

From the function of sdp.sce what is the exact value of the optimization problem ? Simulate the optimal strategy and give an estimation of this value and asymptotic error bounds.

What do you observe for the ﬁnal stock? Explain why.

4.2 SDDP resolution

We now apply an SDDP approach to the same problem.

In the ﬁles that were downloaded at the beginning, the function SDDP creates an approximate of the Bellman function given the constraints and random trajectories, then the function sddp_policy deﬁnes the optimal policy given the former approximate.

Question 15 Using the function of SDDP picture the trajectory of the stocks corresponding to the optimal policy. Evaluate the optimal payoﬀ, and the error bounds.

Evaluate the lower bound given by SDDP. Deduce the estimated gap (in %).

Plot the improvement of the gap with iterations of SDDP.

4.3 Comparison of SDDP and DP

Question 16 Compare the numerical results given by the DP approach, and the SDDP approach.

Sum up the theoretical and practical diﬀerences between DP and SDDP: what are the pros and cons of each method.