Hydroelectricity is the main renewable energy which is produced in France (over 10% of the country annualproduction). It provides a clean (no greenhouse gases emissions) and fast-usable energy that is cheapand substitutable for the thermal one. We address the question of the production management ofcascade-interconnected dams by means of a finite-horizon Markov decision process. Solving such a problem is notstraightforward since it displays a large size, so is likely to suffer from the curse of dimensionality. We handleit with a decomposition-coordination algorithm that we adapt by approximation to the stochasticcase.
This pratical work will follow the lecture entitled Decomposition-coordination methods in
stochastic optimal control (P. Carpentier). The computations will be performed using nspand only nsp. In order to launch nsp write nsp & in a terminal. Type pwd to see the current
repertory and then download the practical work material from the Internet to this repertory. Since
we will use a three dimension dynamic programming algorithm, a part of the programming
functions had to be written in C. Moreover, we have to load some scenarios datas. Thus, we will
need a specific nsp toolbox called COURSES/stochdec. In order to load it, go to the stochdec
repertory and type exec('loader.sce');.
1 Description of the problem
1.1 Dynamics of a single dam and dams cascade
Let time t vary in {0,...,T}. The following real valued random variables are defined on a
probability space :
Xti is the storage level at the beginning of period [t,t + 1); (state)
Uti is the hydroturbine outflow during [t,t + 1); (productioncontrol)
Zti is the inflow from dam i − 1 into dam i during [t,t + 1); (coordinationcontrol)
Wti is the exogeneous inflow into dam i during [t,t + 1).; (noise)
The dynamics of the dam reads:
(1)
When Xti + Wti + Zti−Uti is greater than xi, the difference is spilled from dam i to dam i + 1
so that the upper bound constraint on dam i is directly handled by the dynamics. Therefore the
total outflow from dam i to dam i + 1 is given by:
The bounds constraints are:
(2)
Figure 1: Dynamics of dam i
Each dam i ∈{2,…,N} receives from the dam i − 1 outflows gti−1 as
inflows Zti and sends its own outflows into the dam i + 1 reservoir, at every t ∈{0,…,T}. With the
notation Zt1≡ 0, this leads to the following coupling constraints:
(3)
Figure 2: Dynamics of three cascade-interconnected dams
1.2 The optimization problem
We assume that the noise random variable Wt = (Wt1,Wt2,Wt3) are independent over time. The
hydroelectric production obeys the following valorization mechanism for every dam:
the production at time t, Uti, is pt linearly retributed and a quadratic cost to produce
𝜀 ×Uti2 has to be considered,
a final storage level appreciation function Ki(XTi) prevents the reservoir from being lower
than a target storage level i at the end of the study period,
We consider the expected value of the sum of every dam total gain as the problem
criterion:
(4)
Note that this criterion is both time and space separable. The control U = (U0i,…,UT−1i)i must
satisfy the non anticipativity constraint in the hazard-decision framework. So, Uti has to be
measurable with respect to the noises up to time t:
(5)
The problem consists in minimizing the criterion (4) under the dynamics (1), the coupling (3),
the bounds (2) and the measurability (5) constraints.
1.3 Numerical instance
We will test algorithms on a valley composed of three cascade-interconnected dams over a T = 12
time steps period. The problem is of a discrete type and the dams share the following common
characteristics:
the volume Xti of every dam i at t belongs to the forty one integers set
;
the space in which every control Uti takes its values is discretized over the six integers
set ;
the noise random variables Wti are considered as being interstage independent and
uniformely distributed over a finite set of ten integers.
The simulation part is based on 500 intakes scenarios.
Question 1Execute the script answer-question1.scewhich is stored in the directoryCOURSES/stochdec. This script will draw Nsimulation scenarios for the dam number 1. Takea look at the script and change it to draw the scenarios of the other dams.
The contents of the script answer-question1.sce is reproduced here:
// You should be in COURSES/stochdec to run this script exec('loader.sce'); SIMSCEN=dp_get_simulation_scenarios(); // get the inflows law on dam 1 M=fscanfMat('Datas/Generated_datas/generated_intakes_0.txt'); // QUESTION 1: xclear(); [nss,v]=size(SIMSCEN) N=10 // Draw N scenarions of intakes of dam 1 Sim=SIMSCEN(1:N,:)+1;// take care to add one T=dp_horizon(); res=Sim; for t=1:T-1do res(:,t)=M(Sim(:,t),t); end; xset('colormap',jetcolormap(N)); plot2d([],res'); xtitle(sprintf('first %d simulation scenarios',N));
2 Exact solution by dynamic programming
The noises Wti are independent over time, so that the problem can be theoretically solved by
stochastic dynamic programming (DP). Although, DP is prone to face the curse of dimensionality
(the method is not numerically tractable as soon as the number of dams is larger than 5 ) you will
first solve the three cascade interconnected dams problem by DP since it is numerically possible (as
N = 3). Note that the discretization of the problem avoids us to use an interpolation scheme to
implement the method.
Question 2Running DP on the cascade of three connecteddams is obtained using the function dp_run(). Then call the function dp_simu()as follows[cost, stocks, controls]=dp_simu()to obtain the expected cost and the stocks and thecontrols trajectories. Plot the stocks trajectories and show the optimal expected cost. Notethat, for example, you will obtain the stocks evolution of dam 1 by using S1=stocks{1}.
Question 3For the most sceptical of you, just try to beat the DP-computed strategyby building your own strategy ! ;)For example, you may want to multiply the optimalcontrol mapping by a scalar γ ∈ [0,1] to get a new control U(you can use dp_getu(U)anddp_setu(U)to obtain or set the current used control table). Simulate the strategies obtainedwith your control strategy by using dadp_simu()and compare to what you had obtainedbefore.
Question 4You can also design your own strategy using function dp_f2uand simulateyour strategy as seen in question3. How far is your solution expected cost from the optimalone?
The next script gives some hints on how to use dp_f2u:
function u=myfeedback(t,w,x) // scalar t, vector w of size 4 (three inflows and a price) //, vector x of size 3 (water levels of the dams) u=sin(t)*x(1)*w(4)*[0;1;0]+x(3); endfunction // dp_f2u(fn): use a feedback function given as // argument to fill a control array U. U=dp_f2u(myfeedback) ...
3 Approximate solution by dual approximate dynamic programming
In a few words, we decompose the three dimension dynamic state problem into three one dimension dynamic stateproblems. This is done by:
dualizing the coupling constraints;
coordinating the local optimal solutions to got the global optimal one using an Uzawa algorithm.
3.1 Dualization of the coupling constraint and DADP
We aim at dualizing (3) and at solving the problem by using an Uzawa algorithm: at iteration k,
the associated multiplier is a fixed ℱt-measurable random variable λti+1(k), and the term (under
the expectation) induced by duality in the cost function is
It can be decomposed as
λti+1(k).Zti+1: term pertaining to dam i + 1.
−λti+1(k).gtiXti,Uti,Wti,Zti: term pertaining to dam i.
Finally, the following term is added to the cost of dam i
The main idea of the price decomposition of the problem is to see Zti as an independent
control variable as shown in figures 3. As Uti does, Zti has to observe the (3) constraint.
(a)
(b)
Figure 3: (a): Whole Problem (b): Decomposed Problem
Then, we would like to solve the resulting N independent one-dam subproblems by
DP1 ,
(6)
But, the presence of the random variables λtit=0,…,T−1(k) prevents to use dynamic
programming in a straightforward manner since the random variables λti are not independent over
time. The idea of DADP is to replace the multipliers λti(k) by their conditional expectations
w.r.t. chosen information variable Yti, namely
Note that this is equivalent to replace the (3) constraint by 𝔼Zti− gti−1(…)Yti= 0.
In practice, Yti is a short-memory process that will enter the state variables of the
subproblems. In the sequel we will consider that Yti≡ const, so that ti(k) := 𝔼λti(k).
Thus, the Uzawa algorithm reads:
initialization: choose initial values for the multipliers ti(0) for all i and t,
minimization: solve N independent one dimensional subproblems by dynamic
programming,
coordination: update the multipliers by a gradient step as follows:
simulation: apply the current strategy on a sample of n noise trajectories
ωl, to get n realizations of the random variables (gti(…))(k+1)(ωl) by time
step, compute all discrepancies (δti)(ωl) in the coupling constraints given
by:
gradient step: update the multipliers by taking the average of the discrepancies
(δti)(ωl)
stopping condition: go to step (2) until convergence of the algorithm.
Question 5Run the script job.sceto see iterations of the DADP algorithm solving thethree cascade interconnected dams problem. You should observe the multipliers expectedvalues and the dual costs converging in about three thousand iterations. At the endof iterations you can obtain the optimal multipliers expected values using the functiondadp_get_lambda(). Use this function to store this values in matrix OPT_LAMBDAby callingOPT_LAMBDA=dadp_get_lambda().
Figure 4: Uzawa algorithm
Question 6Have a look at the Uzawa algorithm figure Figure 4. Implement your owngradient step process and make the gradient step size values vary to see how it affects theconvergence of the algorithm. You can use the following program skeleton.
exec loader.sce nspsoc('load_data'); l1=[];l2=[];cost=[],costa=[],costaa=[],costab=[]; dadp_set_gstep([1*ones(11,1),0.1*ones(11,1)]); for i=1:200 dadp_minimization_step(); // DP for each dam given lambdas // now you should update the lambdas using the following functions // dadp_coordination(): computes the lambda gradients that you can then // obtain with dadp_get_dlambda(); // then dadp_get_gstepsize() returns the gradient stepsizes. // you can get or set the current values of lambda with // L=dadp_get_lambda(); // or // dadp_set_lambda(L); end
Question 7Write your own dadp_coordination()function using the following codeskeleton and then compare your code to the reference one.
function [dl,dc]=dadp_coordination1() ndams = dadp_ndams(); T = dadp_horizon(); Nu = dadp_n_uzawa_scenarios(); // dc: used to store the dual cost // dl: used to accumulate the average gradients it should be of size // zeros(2,T-1); for ksimul=0:Nu-1 // dadp_x0(); // gives the initial stocks for t=0:T-2 // dadp_get_noise(ksimul,t); // get the intakes and price // [xp,u,g,costs]=dadp_dynamics(ksimul,t,x,w); ... end // dadp_final_cost(x); // returns the final cost for each dam end dc =- dc /Nu; // average dl = dl/Nu; // average endfunction
Question 8You can change the previous function dadp_coordination1in order to getthe deviations from the coupling constraints trajectories at the optimal multipliers expectedvalues. Compute these values when at the optimal multipliers expected values that you havestored before.
The approximation Yti≡ const is a relaxation of the problem (as the constraint is loosened).
Thus, the strategies that we derive may not be feasible, even if the algorithm converges: we have to
design a method to go back to a feasible strategy.
3.2 A heuristic to construct an admissible solution
Once the algorithm has converged we have feedbacks laws that verify the constraint
But, this means that the physical constraint Zti = gti−1…a.s. is not verified. Consequently one
has to define an heuristic to turn this strategies into an admissible one.
We first note that the optimal control to apply along a given noise trajectory may be computed
using the Bellman function. The way to do this is the following: suppose that you have the three
dimension DP-computed Bellman function
For every scenario
For t=0:1:T-1
get the current state and the inflow value for every dam;
set a variable opt_cost to infinity;
For u_i=0:4:40 (a loop a dam, three loops here thus)
set the corresponding instant cost to a variable cost;
apply the dynamic to get the reached states;
add the reached states t+1 Bellman function value to cost;
If cost < opt_cost then
opt_cost=cost and set the u_i values to a vector opt_u;
apply the dynamics with u=opt_u and collect the new current states;
Question 9Use the simulation part of the previous coordination function to implementthe simulation using the Bellman function you get by DP and compare it to the dp_simu()results.
We don’t have the three dimension Bellman function but we can construct an approximate
Bellman’s value function for the global problem as the sum of the Bellman’s value function of each
subproblem. Consequently we obtain a global admissible strategy by doing a one time step
optimization of the global problem. More precisely if we write V ti(xti) the Bellman’s
function obtained for subproblem i when the DADP algorithm has converged, we define
(xt) = ∑i=1NV ti(xti−1,xti,xti+1), with the convention that¡ x0t = 0. Then the control we choose
at time t when the cascade of dams is in the state xt and the wt is observed is the optimum of
This problem is global on the cascade but only done on one time-step, so numerically
tractable.
Question 10Apply the heuristic to the optimal strategy we computed usingdadp_admissible()and simulate with [cost, stocks, controls]=dadp_simv(). Use thefunction which checked for the feasibility of the DADP computed strategy to see that thestrategy is now feasible. Compare the results with those we get by the application of the exactoptimal strategy.