In this practical work, you will play the role of a dam manager. You will try tomaximize the mean intertemporal payoff obtained by selling hydropower produced bywater releases, when the water inflows (rain, snowmelt, outflows from upper dams)are supposed to be random and the energy prices deterministic. You will proposedifferent water turbinated policies and you will compute their expected payoff. Then,you will compute the optimal payoff and display an optimal policy. At last, youwill consider different variations around this problem: highlighting the informationstructure influence by comparing the “decision-hazard” and the “hazard-decision”settings, evaluating a “fair final value of the water” or testing robustness with respectto inflow probability distribution.
1 Problem statement and data
We consider a dam manager intending to maximize the intertemporal payoff obtained by selling
power produced by water releases, when the water inflows (rain, snowmelt, outflows from upper
dams) are random and the energy prices are supposed to be deterministic.
1.1 Dam dynamics
We model the dynamics of the water volume in a dam by a function f defined as
follows1
where
time t ∈ 𝕋 = {ti,ti + 1,…,tf} is discrete (months in this case), and t denotes the
beginning of the period [t,t + 1[; we call ti the initial time and tf the horizon;
x(t) volume of water in the dam at the beginning of period [t,t + 1[, belonging to a
finite set 𝕏 ⊂ [x,x], where x (resp. x) is the minimum (resp. maximum) dam volume;
u(t) turbinated outflow volume during [t,t + 1[, decided at the beginning of the time
period [t,t + 1[, and belonging to a finite set 𝕌 ⊂ [u,u], where u (resp. u) is the minimum
(resp. maximum) volume which can be turbinated by time unit; we assume that the
minimum value u is equal to zero:
(2)
w(t) inflow water volume (rain, snowmelt, etc.) during the time period [t− 1,t[, belonging to
the finite set 𝕎(t) ⊂ [w(t),w(t)], where w(t) (resp. w(t)) is the minimum (resp. maximum)
possible inflow at time t; we assume that w(t) ≥ 0 for all t; the dependence of the
sets 𝕎(t) upon time t allows to take into account seasonal effects (e.g. more rain in
autumn).
The dam manager is supposed to make a decision at each t ∈ 𝕋, here turbinating u(t) at the
beginning of the period [t,t + 1[, before knowing the water inflow w(t + 1): such a setup
is called decision-hazard setting. We add the following constraint on the turbinated
water:
(3)
Constraint (3) ensures u(t) is feasible, that is, such that the dam volume remains greater or equal
to its minimum value x (remember that any inflow w(t) is nonnegative).
1.2 Criterion: intertemporal payoff
The manager problem is one of payoff maximization. At each time t, the turbinated water u(t)
produces electricity which is sold on markets at a price p(t). The associated financial
income is modeled by a linear function L(t,u) = p(t)u, leading to the instantaneouspayoff
(4)
Moreover, the volume x(tf) remaining in the dam at the horizon tf is valued (this is
the so-called final value of water) using a quadratic function K, leading to the finalpayoff
(5)
Here, α is a coefficient and xref is a given reference volume for the dam at horizon tf. The
final payoff is negative if and only if x(tf) < xref. From ti to tf, the payoffs sum up
to
(6)
The sequence of prices p(ti),…,p(tf− 1) is supposed to be deterministic, hence known in advance
at initial time ti where the optimization problem is posed.
1.3 Probabilistic model for water inflows
A water inflow scenario is a sequence of inflows in the dam from time ti up to tf− 1
(7)
We model water inflow scenarios using a sequence of random variables with a known joint
probability distribution ℙ such that
the random variables w(ti),…,w(tf− 1) are independent,
each random variable w(t) follows a uniform probability
distribution2
on the finite set 𝕎(t):
(8)
Notice that the random variables w(ti),…,w(tf− 1) are independent, but that they are not
identically distributed. Independence is a key assumption to obtain the dynamic programming
equation (18) with state x.
1.4 Numerical data
Time.
We consider a montly management (the interval [t,t + 1[ represents one month) over one year,
so that
(9)
State and control.
Concerning the dam volume and the turbinated water, we consider the following
bounds:
(10)
We assume that the dam volume is discretized using a stepsize δx = 2hm3, whereas the stepsize
used for discretizing the control is δu = 8hm3 (multiple of δx). Accordingly, the sets containing
the possible values of the state and the control are
The water volume in the dam at time ti is known and equal to xi = 40hm3, and the reference
volume used to define the final payoff K in (5) is xref = 40hm3.
Prices scenario.
The price scenario p(ti),…,p(tf− 1), known in advance, is shown in Figure 1.
Figure 1: Prices trajectory for valuing the turbinated water
Water inflow scenarios.
At each time t, the range 𝕎(t) of the water inflow w(t) is obtained by discretizing an interval
centered around a given mean value (see Figure 2). Each interval is discretized using a
stepsize δw = 2hm3 (multiple of δx). Consider for example time t = 1: the minimum (resp.
maximum) inflow value is w(1) = 12hm3 (resp. w(1) = 28hm3), so that the finite support 𝕎(1)
of the (uniform) probability distribution associated with the random variable w(1)
is
the probability weight of each element w ∈ 𝕎(1) being 1∕9.
Knowing the probability distribution, it is easy to draw water inflow scenarios since we assumed
that the random variables w(t) are independent.
Figure 2: Inflows probability laws support (left) and some associated scenarios (right)
Scilab code.
The following Scilab code contains the data and macros corresponding to the numerical
example described in §1.4.
//------------------------------ // PROBLEM DATA //------------------------------ // Time characteristics // -------------------- // Time horizon Ti=1;// Don't change this value! Tf=13; // Dam characteristics // ------------------- xmin=0; xmax=80; xini=40; xref=40; // Dam discretization xdlt=2;// Integer: xmin, xmax and xini are multiples of xdlt Nx=((xmax-xmin)/xdlt)+1; // Control characteristics // ----------------------- umin=0;// Don't change this value! umax=40; // Control discretization udlt=8;// Integer, multiple of xdlt: umin and umax are multiples of udlt Nu=((umax-umin)/udlt)+1; // Electricity prices // ------------------ prices=[48.0,47.0,87.0,37.0,35.0,40.0,29.0,16.0,33.0,38.0,48.0,36.0]; // Inflows characteristics // ----------------------- // Inflow discretization wdlt=2;// Integer, multiple of xdlt // Inflow mean values and maximal variations wexp=[20.0,24.0,16.0,12.0,08.0,04.0,04.0,10.0,16.0,18.0,30.0,20.0]; wect=[08.0,16.0,08.0,08.0,04.0,02.0,02.0,08.0,10.0,12.0,20.0,10.0]; // Probability laws of the inflows (uniform law with finite support) wlaws=list(); plaws=list(); for t=Ti:Tf-1do wlaws(t)=[(wexp(t)-wect(t)):wdlt:(wexp(t)+wect(t))]; plaws(t)=ones(wlaws(t))/length(wlaws(t)); end // Macro generating Ns inflow scenarios function wscenarios=inflow_scenarios(Ns,wlaws,plaws) // Random number generator grand("setgen","clcg2"); grand("setsd",12345,67890); // Scenario generation wscenarios=zeros(Ns,Tf-1); for t=Ti:Tf-1do // Probability law at t wlawt=wlaws(t); plawt=plaws(t); // Probability transition matrix for uniform sampling // We use grand(...,"markov",...,...) to generate independent sequences proba=ones(length(plawt),1)*plawt; samples=grand(Ns,"markov",proba,1); wscenarios(:,t)=wlawt(samples)'; end endfunction
Question 1
a)
[1] Copythis code into a fileDamManagement.sce and check that it corresponds tothe data given in§1.4.
b)
[2] Launch Scilab and execute the fileDamManagement.sce. Use the macroinflow_scenarios to generate 100 scenarios of water inflow. Plot the first 20 scenariosand compare them to the ones given in Figure2.
2 Simulation and evaluation of given policies
In order to simulate (and then optimize) the dam behavior, we need to have the Scilab macros
computing the dynamics (1) and the payoffs (4) and (5).
//------------------------------ // DAM MACROS //------------------------------ function xn=dynamics(t,x,u,w) // Dynamics of the dam xn=min(xmax,x-u+w); endfunction function payoff=instant_payoff(t,u) // Instantaneous payoff at time t payoff=prices(t)*u; endfunction function payoff=final_payoff(x) // Final payoff at horizon payoff=-min(0,x-xini) .^2; endfunction
Question 2[2] Copythe above Scilab code at the end of the file DamManagement.sce and check thatit corresponds to the expressions (1)–(4)–(5). Notice that these macros are written in such away that they may be fed either with scalar values or with vector values. This feature will bebe used for efficiently simulating a bunch of inflow scenarios (see Question3).
2.1 Evaluation of a turbining policy
An admissible policy γ : 𝕋 × 𝕏 → 𝕌 assigns a turbinated water amount u = γ(t,x) ∈ 𝕌 to
any time t ∈ 𝕋 and to any dam volume x ∈ 𝕏, while respecting constraint (3), that
ias,
Given an admissible policy γ and given an inflow scenario
(11)
we are able to build a dam volume trajectory
(12)
and a turbinated water trajectory
(13)
produced by the “closed-loop” dynamics initialized at the initial time ti by
We also obtain the payoff associated to the inflow scenario w(⋅)
(15)
where x(⋅) and u(⋅) are given by (14c). The expected payoff associated with the policy γ
is
(16)
where the expectation 𝔼 is taken with respect to the product probability ℙ,
whose marginals are given by (8). The true expected value (16) is difficult to
compute,3
and we evaluate it by the Monte Carlo method using Ns inflow scenarios w1(⋅),…,wNs(⋅):
(17)
By the law of large numbers, the mean payoff (17) is a “good” approximation of the expected
payoff(16) if the number of scenarios is “large enough”.
We propose the following Scilab code in order to evaluate the mean payoff associated to a
policy given by the Scilab macro policy.
//------------------------------ // SCENARIO BASED SIMULATOR //------------------------------ function [xscenarios,uscenarios,cscenarios]=simulation(wscenarios,policy) // Initialization xscenarios=zeros(Ns,Tf);// used to store the state trajectories uscenarios=zeros(Ns,Tf-1);// used to store the control trajectories cscenarios=zeros(Ns);// used to store the payoff values // Simulation in forward time xscenarios(:,1)=xini*ones(Ns,1); for t=Ti:Tf-1do x=xscenarios(:,t); u=policy(t,x); w=wscenarios(:,t); xn=dynamics(t,x,u,w); cscenarios=cscenarios+instant_payoff(t,u); xscenarios(:,t+1)=xn; uscenarios(:,t)=u; end cscenarios=cscenarios+final_payoff(xn); printf('\n Mean payoff (Monte Carlo): %f\n',mean(cscenarios)); endfunction
Question 3
a)
[1] Copythe above Scilab code at the end of the file DamManagement.sce. Check thatit corresponds to the expressions (14c)–(15)–(17) for a given policyγ (input argumentpolicy of the macro simulation).
b)
[2] Explain in detail how the macro simulation computes these quantities (seeQuestion2).
2.2 Simulation of a given policy
In order to test the Scilab macro simulation given above, consider the following code.
[2] Explain the control policy induced by the Scilab macro heuristic_policy.Copythe relevant part of the above Scilab code at the end of DamManagement.sce, thengenerate 10,000 inflow scenarios and simulate the policy heuristic_policy along them.
(b)
[2] Copythe relevant part of the above Scilab code at the end ofDamManagement.sce, then plot some trajectories corresponding to the dam volume andto the turbinated water obtained using this policy, and provide a histogram of the payoffdistribution. Comment the figures thus obtained.
We now want to design new policies, writing Scilab macros following the model given by the
macro below:
function u=my_policy(t,x) u=max(umin,min(WRITE_A_FORMULA_HERE,min(x,umax))); endfunction
Question 5
(a)
[2] Explain why we use the expression max(umin,min( ..... , min(x,umax)))inthe macro my_policy above.
(b)
[1+1+2] Write the Scilab code of the following policies:
always turbinate at the maximum possible (myopic),
turbinate a fraction of the maximum possible,
and evaluate them thanks to the scenario based simulator.
(c)
[2] Propose a policy which depends on the prices knowledge, and explain how you designedthis policy.
(d)
[2] Evaluate this last policy, and compare it to the ones designed in item (b).
3 Resolution by the stochastic dynamic programming approach
In order to compute the optimal turbinated water policy, we provide the stochastic dynamic
programming (or Bellman) equation associated with the problem of maximizing the expected
payoff (16), as well as the corresponding Scilab code.
3.1 Stochastic dynamic programming equation
Since the water inflows w(t) are supposed to be independent random variables, Dynamic
Programming applies and the equation associated with the problem of maximizing the expectedpayoff (16) writes
for t varying from tf− 1 to ti. The expectation 𝔼 is taken with respect to the marginal
probability distribution ℙw(t+1) given by (8), that is,
A difficulty when programming the Bellman equation is that we need to manipulate both
the values (t,x) of the time and of the state, and the associated indices (t,i) in the
table storing the values of the function V (t,x). In our application, the treatment of
index t is obvious because t ∈ 𝕋 = {ti,ti + 1,…,tf}, but we need a tool performing the
conversion for the volume. The following Scilab macro state_index returns the index i
associated to an element x in the finite set 𝕏 containing the possible values of the dam
volume.
function i=state_index(x) // Index associated with a volume value i=dsearch(x,[xmin:xdlt:xmax],'d'); if i==0then printf('\n Invalid dam volume: %f\n',x); end endfunction
Question 6[2] Copythe above Scilab code at the end of the file DamManagement.sce.Explain in detail how the macro works.
3.2 Scilab code for the dynamic programming equation
We provide the following code in order to compute the Bellman function.
//------------------------------- // STOCHASTIC DYNAMIC PROGRAMMING //------------------------------- // Initialization Vbell=-ones(Tf,Nx)*%inf;// Used to store the optimal payoff functions Ubell=ones(Tf-1,Nx)*%inf;// Used to store the optimal controls // SDP computation by backward induction function [Vbell,Ubell]=sdp_solve() // Bellman function at final time payoff=final_payoff([xmin:xdlt:xmax]); Vbell(Tf,:)=payoff'; // Loop backward in time for t=Tf-1:-1:Tido wlawt=wlaws(t); plawt=plaws(t); Nw=length(wlawt); for x=xmin:xdlt:xmaxdo i=state_index(x); for u=[umin:udlt:min(x,umax)]do ctot=0; for k=1:Nwdo w=wlawt(k); p=plawt(k); xn=dynamics(t,x,u,w); in=state_index(xn); ctot=ctot+(p*Vbell(t+1,in)); end ctot=ctot+instant_payoff(t,u); if ctot >Vbell(t,i)then Vbell(t,i)=ctot; Ubell(t,i)=u; end end end end endfunction
Question 7
(a)
[2] Copythe above Scilab code at the end of the fileDamManagement.sce. Explainin detail why the macro sdp_solve corresponds to the computation of the DynamicProgramming equation(18).
(b)
[1] Execute the macro sdp_solve, which generates the table Vbell containing thevalues of the Bellman function and the table Ubell containing the values of the optimalcontrols. Display the value of the Bellman function at(ti,xi).
(c)
[2] Plot different Bellman functions and comment on their shape.
3.3 Simulation of the optimal policy
Consider the following Scilab macro.
function u=optimal_policy(t,x) i=state_index(x); u=Ubell(t,i)'; endfunction
Question 8
(a)
[2] Explain why the control policy induced by the Scilab macro optimal_policy isoptimal.
(b)
[1] Generate 10,000 water inflow scenarios using the macro simulation andsimulate the policy optimal_policy along them. Plot some trajectories correspondingto the dam volume and to the turbinated water, obtained using this policy.
(c)
[2] Provide a histogram of the distribution of the random payoff, and compare themean payoff(17), with the valueV (ti,xi) of the Bellman function evaluated at(ti,xi).
(d)
[3] Compare these results with those obtained using all previous policies.
Here we assume that the values of optimal control have not be stored in the table
Ubell, so that the only outcome of the macro sdp_solve is the table Vbell (Bellman
values).
Question 9[2] Provide a method and the corresponding code to compute the optimalvalue of the control at timet for a given dam volumex(t) using only the values stored inthe table Vbell at timet + 1.
Hint: the optimal control at(t,x(t)) is thearg max of the following optimization problem:
(19)
Be careful that your code must accept a vector of volumes as input.
4 Possible extensions
In this section, we propose several developments and extensions for the dam management
problem.
4.1 Hazard-decision framework
In the hazard-decision framework, the dam manager is supposed to make a decision, here
turbining u(t) at the beginning of the period [t,t + 1[, knowing the water inflow w(t + 1) in
advance.
The Bellman equation is now
Once the value function V (t,x) evaluated and stored for all time t and state x, the optimal
control at time t starting from state x and observing the water inflow w is the arg max
of
(21)
which leads to an optimal policy depending on the triplet (t,x,w).
[2] Adapt the Scilab macro sdp_solve to solve the hazard-decision Bellmanequation(20). We decide not to store the optimal policy in the tableUbell.
(c)
[1+2] Based on(21), write the Scilab code which computes the optimal control asa function of(t,x,w), and use it to obtain by simulation the optimal trajectories of thestate and the control as well as the optimal payoff distribution. Compare.
4.2 Computation of a fair final value of water
Till now, the gain in leaving water in the dam at the end of the time horizon was arbitrarily fixed.
Now, we provide a procedure to estimate a “ fair final value of water”, that is, a function xK(x)
as in (5).
The intuition behind the procedure is that the final value of water is the value that a manager
would put on the dam, were he to run it from the date tf + 1 to infinity. The final value of water is
the solution of an infinite horizon maximization problem. The procedure below mimicks an
algorithm to find a fixed point by iterations.
First, we start with a zero final payoff function and obtain, by backward
induction, the Bellman function at initial time ti. Up to a translation — to account for
the fact that an empty dam has zero final value of water — we identify the function
with the new final value of water . Proceeding along, we expect
that this loop converges towards a function xK(∞)(x) which is a good candidate for the final
value of water.
We design a loop for n = 1,…,N, starting with a zero final value of water
then solving the backward Bellman equation
and then closing the loop by choosing the new final value of water
(22)
as the Bellman function at the initial time ti after a shift, so that K(n+1)(x) = 0.
Question 11Implement the iterative procedure described above, and compute and plot the“fair final value of water” in the dam.
4.3 Robustness with respect to water inflow scenarios
An interesting point, which occurs in practical problems, is the following. In the case where the
random variables w(ti),…,w(tf− 1) are correlated in time, it is always possible to obtain the
marginal probability laws ℙw(t) of each w(t) and then compute the Bellman function as
in §3. We thus obtain a feedback policy that would be optimal if the random inflow
variables were uncorrelated. The loss of optimality may be evaluated by simulating the
feedback policy along scenarios drawn using the “true” probability law ℙ of the process
w(ti),…,w(tf− 1).
Question 12
(a)
Draw inflow scenarios according to an auto-regressive AR(1) process.
(b)
Compute the marginal probability distributionℙw(t)induced by these scenarios.
(c)
Compute the Bellman solution of the problem using these marginal probability laws andthe associated optimal policy.
(d)
Using this policy, run a simulation along scenarios drawn according to the AR(1) process.
(e)
Evaluate the loss of optimality induced by ignoring the correlation in the inflow process.
References
D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific,
Belmont, Massachusets, second edition, 2000. Volumes 1 and 2.
M. De Lara and L. Doyen. Sustainable Management of Natural Resources. MathematicalModels and Methods. Springer-Verlag, Berlin, 2008.