Abstract

In this practical work, you will play the role of a dam manager. You will try to maximize the mean intertemporal payoﬀ obtained by selling hydropower produced by water releases, when the water inﬂows (rain, snowmelt, outﬂows from upper dams) are supposed to be random and the energy prices deterministic. You will propose diﬀerent water turbinated policies and you will compute their expected payoﬀ. Then, you will compute the optimal payoﬀ and display an optimal policy. At last, you will consider diﬀerent variations around this problem: highlighting the information structure inﬂuence by comparing the “decision-hazard” and the “hazard-decision” settings, evaluating a “fair ﬁnal value of the water” or testing robustness with respect to inﬂow probability distribution.

1 Problem statement and data

We consider a dam manager intending to maximize the intertemporal payoﬀ obtained by selling power produced by water releases, when the water inﬂows (rain, snowmelt, outﬂows from upper dams) are random and the energy prices are supposed to be deterministic.

1.1 Dam dynamics

We model the dynamics of the water volume in a dam by a function f deﬁned as follows¹

The dam manager is supposed to make a decision at each t ∈ 𝕋, here turbinating u(t) at the beginning of the period [t,t + 1[, before knowing the water inﬂow w(t + 1): such a setup is called decision-hazard setting. We add the following constraint on the turbinated water:

Constraint (3) ensures u(t) is feasible, that is, such that the dam volume remains greater or equal to its minimum value x (remember that any inﬂow w(t) is nonnegative).

1.2 Criterion: intertemporal payoﬀ

The manager problem is one of payoﬀ maximization. At each time t, the turbinated water u(t) produces electricity which is sold on markets at a price p(t). The associated ﬁnancial income is modeled by a linear function L(t,u) = p(t)u, leading to the instantaneous payoﬀ

Moreover, the volume x(t_f) remaining in the dam at the horizon t_f is valued (this is the so-called ﬁnal value of water) using a quadratic function K, leading to the ﬁnal payoﬀ

Here, α is a coeﬃcient and x_ref is a given reference volume for the dam at horizon t_f. The ﬁnal payoﬀ is negative if and only if x(t_f) < x_ref. From t_i to t_f, the payoﬀs sum up to

The sequence of prices

p(t_i),…,p(t_f − 1)

is supposed to be deterministic, hence known in advance at initial time t_i where the optimization problem is posed.

1.3 Probabilistic model for water inﬂows

A water inﬂow scenario is a sequence of inﬂows in the dam from time t_i up to t_f − 1

We model water inﬂow scenarios using a sequence of random variables with a known joint probability distribution ℙ such that

Notice that the random variables

w(t_i),…,w(t_f − 1)

are independent, but that they are not identically distributed. Independence is a key assumption to obtain the dynamic programming equation (18) with state x.

1.4 Numerical data

Time. We consider a montly management (the interval [t,t + 1[ represents one month) over one year, so that

State and control. Concerning the dam volume and the turbinated water, we consider the following bounds:

We assume that the dam volume is discretized using a stepsize δx = 2 hm³, whereas the stepsize used for discretizing the control is δu = 8 hm³ (multiple of δx). Accordingly, the sets containing the possible values of the state and the control are

The water volume in the dam at time t_i is known and equal to x_i = 40 hm³, and the reference volume used to deﬁne the ﬁnal payoﬀ K in (5) is x_ref = 40 hm³.

Prices scenario. The price scenario

p(t_i),…,p(t_f − 1)

, known in advance, is shown in Figure 1.

Water inﬂow scenarios. At each time t, the range 𝕎(t) of the water inﬂow w(t) is obtained by discretizing an interval centered around a given mean value (see Figure 2). Each interval is discretized using a stepsize δw = 2 hm³ (multiple of δx). Consider for example time t = 1: the minimum (resp. maximum) inﬂow value is w(1) = 12 hm³ (resp. w(1) = 28 hm³), so that the ﬁnite support 𝕎(1) of the (uniform) probability distribution associated with the random variable w(1) is

Knowing the probability distribution, it is easy to draw water inﬂow scenarios since we assumed that the random variables w(t) are independent.

Scilab code. The following Scilab code contains the data and macros corresponding to the numerical example described in §1.4.

  //------------------------------
  // PROBLEM DATA
  //------------------------------

  // Time characteristics
  // --------------------

  // Time horizon
  Ti=1;// Don't change this value!
  Tf=13;

  // Dam characteristics
  // -------------------

  xmin=0;
  xmax=80;
  xini=40;
  xref=40;

  // Dam discretization
  xdlt=2;// Integer: xmin, xmax and xini are multiples of xdlt
  Nx=((xmax-xmin)/xdlt)+1;

  // Control characteristics
  // -----------------------

  umin=0;// Don't change this value!
  umax=40;

  // Control discretization
  udlt=8;// Integer, multiple of xdlt: umin and umax are multiples of udlt
  Nu=((umax-umin)/udlt)+1;

  // Electricity prices
  // ------------------

  prices=[48.0,47.0,87.0,37.0,35.0,40.0,29.0,16.0,33.0,38.0,48.0,36.0];

  // Inflows characteristics
  // -----------------------

  // Inflow discretization
  wdlt=2;// Integer, multiple of xdlt

  // Inflow mean values and maximal variations
  wexp=[20.0,24.0,16.0,12.0,08.0,04.0,04.0,10.0,16.0,18.0,30.0,20.0];
  wect=[08.0,16.0,08.0,08.0,04.0,02.0,02.0,08.0,10.0,12.0,20.0,10.0];

  // Probability laws of the inflows (uniform law with finite support)
  wlaws=list();
  plaws=list();
  for t=Ti:Tf-1 do
    wlaws(t)=[(wexp(t)-wect(t)):wdlt:(wexp(t)+wect(t))];
    plaws(t)=ones(wlaws(t))/length(wlaws(t));
  end

  // Macro generating Ns inflow scenarios
  function wscenarios=inflow_scenarios(Ns,wlaws,plaws)
    // Random number generator
    grand("setgen","clcg2");
    grand("setsd",12345,67890);

    // Scenario generation
    wscenarios=zeros(Ns,Tf-1);
    for t=Ti:Tf-1 do
      // Probability law at t
      wlawt=wlaws(t);
      plawt=plaws(t);
      // Probability transition matrix for uniform sampling
      // We use grand(...,"markov",...,...) to generate independent sequences
      proba=ones(length(plawt),1)*plawt;
      samples=grand(Ns,"markov",proba,1);
      wscenarios(:,t)=wlawt(samples)';
    end
  endfunction

Question 1

a): [1] Copy this code into a ﬁle DamManagement.sce and check that it corresponds to the data given in §1.4.
b): [2] Launch Scilab and execute the ﬁle DamManagement.sce. Use the macro inflow_scenarios to generate 100 scenarios of water inﬂow. Plot the ﬁrst 20 scenarios and compare them to the ones given in Figure 2.

2 Simulation and evaluation of given policies

In order to simulate (and then optimize) the dam behavior, we need to have the Scilab macros computing the dynamics (1) and the payoﬀs (4) and (5).

  //------------------------------
  // DAM MACROS
  //------------------------------


  function xn=dynamics(t,x,u,w)
    // Dynamics of the dam
    xn=min(xmax,x-u+w);
  endfunction

  function payoff=instant_payoff(t,u)
    // Instantaneous payoff at time t
    payoff=prices(t)*u;
  endfunction

  function payoff=final_payoff(x)
    // Final payoff at horizon
    payoff=-min(0,x-xini) .^2;
  endfunction

Question 2 [2] Copy the above Scilab code at the end of the ﬁle DamManagement.sce and check that it corresponds to the expressions (1)–(4)–(5). Notice that these macros are written in such a way that they may be fed either with scalar values or with vector values. This feature will be be used for eﬃciently simulating a bunch of inﬂow scenarios (see Question 3).

2.1 Evaluation of a turbining policy

An admissible policy γ : 𝕋 × 𝕏 → 𝕌 assigns a turbinated water amount u = γ(t,x) ∈ 𝕌 to any time t ∈ 𝕋 and to any dam volume x ∈ 𝕏, while respecting constraint (3), that ias,

produced by the “closed-loop” dynamics initialized at the initial time t_i by

where x(⋅) and u(⋅) are given by (14c). The expected payoﬀ associated with the policy γ is

where the expectation 𝔼 is taken with respect to the product probability ℙ, whose marginals are given by (8). The true expected value (16) is diﬃcult to compute,³ and we evaluate it by the Monte Carlo method using N_s inﬂow scenarios

w¹(⋅),…,w^N_s(⋅)

By the law of large numbers, the mean payoﬀ (17) is a “good” approximation of the expected payoﬀ(16) if the number of scenarios is “large enough”.

We propose the following Scilab code in order to evaluate the mean payoﬀ associated to a policy given by the Scilab macro policy.

  //------------------------------
  // SCENARIO BASED SIMULATOR
  //------------------------------

  function [xscenarios,uscenarios,cscenarios]=simulation(wscenarios,policy)
    // Initialization
    xscenarios=zeros(Ns,Tf);// used to store the state trajectories
    uscenarios=zeros(Ns,Tf-1);// used to store the control trajectories
    cscenarios=zeros(Ns);// used to store the payoff values

    // Simulation in forward time
    xscenarios(:,1)=xini*ones(Ns,1);
    for t=Ti:Tf-1 do
      x=xscenarios(:,t);
      u=policy(t,x);
      w=wscenarios(:,t);
      xn=dynamics(t,x,u,w);
      cscenarios=cscenarios+instant_payoff(t,u);
      xscenarios(:,t+1)=xn;
      uscenarios(:,t)=u;
    end
    cscenarios=cscenarios+final_payoff(xn);
    printf('\n Mean payoff (Monte Carlo): %f\n',mean(cscenarios));
  endfunction

Question 3

a): [1] Copy the above Scilab code at the end of the ﬁle DamManagement.sce. Check that it corresponds to the expressions (14c)–(15)–(17) for a given policy γ (input argument policy of the macro simulation).
b): [2] Explain in detail how the macro simulation computes these quantities (see Question 2).

2.2 Simulation of a given policy

In order to test the Scilab macro simulation given above, consider the following code.

  //------------------------------
  // HEURISTIC POLICY EVALUATION
  //------------------------------

  function u=heuristic_policy(t,x)
    // Heuristic policy
    thresmin=xref;
    thresmax=xref;
    u=(umax*bool2s(x > thresmin))+(umin*bool2s(x <= thresmax));
    u=min(u,x);
  endfunction

  // Scenarios generation
  Ns=10000;
  wscenarios=inflow_scenarios(Ns,wlaws,plaws);

  // Simulation
  [xscenarios,uscenarios,cscenarios]=simulation(wscenarios,heuristic_policy);

  // Payoff histogram
  xset("window",101);
  clf();
  histplot(50,cscenarios,normalization = %t,style = 5);
  xgrid;
  xtitle("Payoff distribution");

  // State trajectories
  xset("window",102);
  clf();
  plot2d([Ti:Tf],xscenarios(1:20,:)');
  xgrid;
  xtitle('Dam volume','Time','State');

  // Control trajectories
  xset("window",103);
  clf();
  plot2d([Ti:Tf-1],uscenarios(1:20,:)');
  xgrid;
  xtitle('Turbined water','Time','Control');

Question 4

(a): [2] Explain the control policy induced by the Scilab macro heuristic_policy. Copy the relevant part of the above Scilab code at the end of DamManagement.sce, then generate 10,000 inﬂow scenarios and simulate the policy heuristic_policy along them.
(b): [2] Copy the relevant part of the above Scilab code at the end of DamManagement.sce, then plot some trajectories corresponding to the dam volume and to the turbinated water obtained using this policy, and provide a histogram of the payoﬀ distribution. Comment the ﬁgures thus obtained.

We now want to design new policies, writing Scilab macros following the model given by the macro below:

  function u=my_policy(t,x)
    u=max(umin,min(WRITE_A_FORMULA_HERE,min(x,umax)));
  endfunction

Question 5

(a)

[2] Explain why we use the expression max(umin,min( ..... , min(x,umax))) in the macro my_policy above.

(b)

[1+1+2] Write the Scilab code of the following policies:

always turbinate at the maximum possible (myopic),
turbinate a fraction of the maximum possible,

and evaluate them thanks to the scenario based simulator.

(c)

[2] Propose a policy which depends on the prices knowledge, and explain how you designed this policy.

(d)

[2] Evaluate this last policy, and compare it to the ones designed in item (b).

3 Resolution by the stochastic dynamic programming approach

In order to compute the optimal turbinated water policy, we provide the stochastic dynamic programming (or Bellman) equation associated with the problem of maximizing the expected payoﬀ (16), as well as the corresponding Scilab code.

3.1 Stochastic dynamic programming equation

Since the water inﬂows w(t) are supposed to be independent random variables, Dynamic Programming applies and the equation associated with the problem of maximizing the expected payoﬀ (16) writes

for t varying from t_f − 1 to t_i. The expectation 𝔼 is taken with respect to the marginal probability distribution ℙ_w(t+1) given by (8), that is,

A diﬃculty when programming the Bellman equation is that we need to manipulate both the values (t,x) of the time and of the state, and the associated indices (t,i) in the table storing the values of the function V (t,x). In our application, the treatment of index t is obvious because t ∈ 𝕋 = {t_i,t_i + 1,…,t_f}, but we need a tool performing the conversion for the volume. The following Scilab macro state_index returns the index i associated to an element x in the ﬁnite set 𝕏 containing the possible values of the dam volume.

  function i=state_index(x)
    // Index associated with a volume value
    i=dsearch(x,[xmin:xdlt:xmax],'d');
    if i==0 then
      printf('\n Invalid dam volume: %f\n',x);
    end
  endfunction

3.2 Scilab code for the dynamic programming equation

  //-------------------------------
  // STOCHASTIC DYNAMIC PROGRAMMING
  //-------------------------------

  // Initialization
  Vbell=-ones(Tf,Nx)*%inf;// Used to store the optimal payoff functions
  Ubell=ones(Tf-1,Nx)*%inf;// Used to store the optimal controls

  // SDP computation by backward induction
  function [Vbell,Ubell]=sdp_solve()
    // Bellman function at final time
    payoff=final_payoff([xmin:xdlt:xmax]);
    Vbell(Tf,:)=payoff';
    // Loop backward in time
    for t=Tf-1:-1:Ti do
      wlawt=wlaws(t);
      plawt=plaws(t);
      Nw=length(wlawt);
      for x=xmin:xdlt:xmax do
        i=state_index(x);
        for u=[umin:udlt:min(x,umax)] do
          ctot=0;
          for k=1:Nw do
            w=wlawt(k);
            p=plawt(k);
            xn=dynamics(t,x,u,w);
            in=state_index(xn);
            ctot=ctot+(p*Vbell(t+1,in));
          end
          ctot=ctot+instant_payoff(t,u);
          if ctot > Vbell(t,i) then
            Vbell(t,i)=ctot;
            Ubell(t,i)=u;
          end
        end
      end
    end
  endfunction

Question 7

(a): [2] Copy the above Scilab code at the end of the ﬁle DamManagement.sce. Explain in detail why the macro sdp_solve corresponds to the computation of the Dynamic Programming equation (18).
(b): [1] Execute the macro sdp_solve, which generates the table Vbell containing the values of the Bellman function and the table Ubell containing the values of the optimal controls. Display the value of the Bellman function at (t_i,x_i).
(c): [2] Plot diﬀerent Bellman functions and comment on their shape.

3.3 Simulation of the optimal policy

  function u=optimal_policy(t,x)
    i=state_index(x);
    u=Ubell(t,i)';
  endfunction

Question 8

(a): [2] Explain why the control policy induced by the Scilab macro optimal_policy is optimal.
(b): [1] Generate 10,000 water inﬂow scenarios using the macro simulation and simulate the policy optimal_policy along them. Plot some trajectories corresponding to the dam volume and to the turbinated water, obtained using this policy.
(c): [2] Provide a histogram of the distribution of the random payoﬀ, and compare the mean payoﬀ (17), with the value V (t_i,x_i) of the Bellman function evaluated at (t_i,x_i).
(d): [3] Compare these results with those obtained using all previous policies.

Here we assume that the values of optimal control have not be stored in the table Ubell, so that the only outcome of the macro sdp_solve is the table Vbell (Bellman values).

Question 9 [2] Provide a method and the corresponding code to compute the optimal value of the control at time t for a given dam volume x(t) using only the values stored in the table Vbell at time t + 1.

Hint: the optimal control at (t,x(t)) is the arg max of the following optimization problem:

( ) max 𝔼 L (t,u) + V (t + 1,f (t,x (t),u,w )) . u∈𝕌,u≤x(t)

(19)

Be careful that your code must accept a vector of volumes as input.

4 Possible extensions

In this section, we propose several developments and extensions for the dam management problem.

4.1 Hazard-decision framework

In the hazard-decision framework, the dam manager is supposed to make a decision, here turbining u(t) at the beginning of the period [t,t + 1[, knowing the water inﬂow w(t + 1) in advance.

Once the value function V (t,x) evaluated and stored for all time t and state x, the optimal control at time t starting from state x and observing the water inﬂow w is the arg max of

Question 10

(a): [1] Explain the diﬀerence between (20) and (18).
(b): [2] Adapt the Scilab macro sdp_solve to solve the hazard-decision Bellman equation (20). We decide not to store the optimal policy in the table Ubell.
(c): [1+2] Based on (21), write the Scilab code which computes the optimal control as a function of (t,x,w), and use it to obtain by simulation the optimal trajectories of the state and the control as well as the optimal payoﬀ distribution. Compare.

4.2 Computation of a fair ﬁnal value of water

Till now, the gain in leaving water in the dam at the end of the time horizon was arbitrarily ﬁxed. Now, we provide a procedure to estimate a “ fair ﬁnal value of water”, that is, a function x

K(x) as in (5).

The intuition behind the procedure is that the ﬁnal value of water is the value that a manager would put on the dam, were he to run it from the date t_f + 1 to inﬁnity. The ﬁnal value of water is the solution of an inﬁnite horizon maximization problem. The procedure below mimicks an algorithm to ﬁnd a ﬁxed point by iterations.

First, we start with a zero ﬁnal payoﬀ function $K (1)(⋅) ≡ 0$ and obtain, by backward induction, the Bellman function $V (1)(ti,⋅)$ at initial time t_i. Up to a translation — to account for the fact that an empty dam has zero ﬁnal value of water — we identify the function $(1) (1) V (ti,⋅) − V (ti,x)$ with the new ﬁnal value of water $(2) K (⋅)$ . Proceeding along, we expect that this loop converges towards a function x

K^(∞)(x) which is a good candidate for the ﬁnal value of water.

as the Bellman function at the initial time t_i after a shift, so that K⁽ⁿ⁺¹⁾(x) = 0.

4.3 Robustness with respect to water inﬂow scenarios

An interesting point, which occurs in practical problems, is the following. In the case where the random variables

w(t_i),…,w(t_f − 1)

are correlated in time, it is always possible to obtain the marginal probability laws ℙ_w(t) of each w(t) and then compute the Bellman function as in §3. We thus obtain a feedback policy that would be optimal if the random inﬂow variables were uncorrelated. The loss of optimality may be evaluated by simulating the feedback policy along scenarios drawn using the “true” probability law ℙ of the process

w(t_i),…,w(t_f − 1)

Question 12

(a): Draw inﬂow scenarios according to an auto-regressive AR(1) process.
(b): Compute the marginal probability distribution ℙ_w(t) induced by these scenarios.
(c): Compute the Bellman solution of the problem using these marginal probability laws and the associated optimal policy.
(d): Using this policy, run a simulation along scenarios drawn according to the AR(1) process.
(e): Evaluate the loss of optimality induced by ignoring the correlation in the inﬂow process.

Dam Optimal Management under Uncertainty

Contents

1 Problem statement and data

1.1 Dam dynamics

1.2 Criterion: intertemporal payoﬀ

1.3 Probabilistic model for water inﬂows

1.4 Numerical data

2 Simulation and evaluation of given policies

2.1 Evaluation of a turbining policy

2.2 Simulation of a given policy

3 Resolution by the stochastic dynamic programming approach

3.1 Stochastic dynamic programming equation

3.2 Scilab code for the dynamic programming equation

3.3 Simulation of the optimal policy

4 Possible extensions

4.1 Hazard-decision framework

4.2 Computation of a fair ﬁnal value of water

4.3 Robustness with respect to water inﬂow scenarios

References