Fermer X

Subway stations optimal energy management

P. Carpentier, J.-P. Chancelier, M. De Lara, V. Leclère and T. Rigaut
(last modification date: October 10, 2017)
Version pdf de ce document
Version sans bandeaux



In this practical work, you will have to manage a battery in order to recover subways braking electrical energy and to supply it to a subway station. Your goal is to minimize the cost of energy consumed on the national grid by exploiting optimally this regenerative braking energy. First you will compute the gains realized when the battery is controlled according to an intuitive strategy. Then you will compare two methods, Stochastic Dynamic Programming and Model Predictive Control. Both methods provide a control for the battey at each time step these methods don’t use the same amount of information about the past randomness realization. And they don’t use their information the same way. You will compare throughout this work different ways to use past information to compute controls aware of the future uncertainties.

1 Problem statement and data

Subway stations represent one third of the Paris subway system energy consumption. There are unexploited energy ressources in subway stations that could be harvested to lower this energy consumption. We could for example recover braking energy from the trains. Currently trains can dissipate their kinetic energy by braking mechanically or convert this energy into electricity. However supply/demand equilibrium have to be ensured on the subway line. Hence a train can brake electrically only if there is a train accelerating/consuming energy nearby. There is therefore a significant amount of potential energy that is currently wasted. We could overcome this problem with an electrical storage.

We consider a subway station equiped with a battery that can charge braking energy and national grid energy on one side (subway line side) and discharge energy to power the subway station which is also powered by the national grid (subway station side). A battery manager has to control this battery in real time. The station demand and the cost of electricity are assumed deterministic. Braking energy available at each time step is uncertain.

PICPICPICPICUt D tUt +W t+1Et s E t+1l

Figure 1: Electrical problem representation
Power Line Icon made by Freepik from www.flaticon.com is licensed by Creative Commons BY 3.0

We present hereby the stochastic optimization problem we want to solve:


We detail all notations and equations in the following.

1.1 Battery dynamics

The battery can store power coming from the trains or the national grid and supply power to the subway station. It cannot do both things at the same time. We control its state of charge (soc). We model the dynamics of the battery at each time step t by a function ft we call the dynamic at time t giving equation (1b) :



  • t 𝕋 := {0, 1,,T 1} is discrete (minutes in this case) and t denotes the beginning of the period [t,t + 1[. We call T the horizon of the problem.
  • St state of charge (kWh) of the battery at the beginning of period [t,t + 1[. It belongs to a set 𝕏 = [Smin,Smax] representing the minimum and maximum state of charge of the battery.
  • Ut energy (kWh) of battery charge or discharge during [t,t+1[, decided at the beginning of time period [t,t + 1[. It belongs to a space 𝕌 = [Umin,Umax] modeling ramp limits for the charge or discharge.
  • Ut = min (0,U t) energy we discharge from the battery
  • Ut+ = max (0,U(t)) energy we charge into the battery
  • ρ is the charge and discharge efficiency (%)

The initial state of charge of the battery is called x0 and appears in the equality constraint (1g)

1.1.1 Variables definition

We define hereby the remaining variables of the problem presented in figure1

  • Ets energy (kWh) we pay to the national grid to power the station during [t,t + 1[
  • Etl energy (kWh) we pay to the national grid to charge the battery during [t,t + 1[
  • Dt energy (kWh) consumed by the subway station during [t,t + 1[
  • Wt braking energy (kWh) available on the subway line during [t,t + 1[. This quantity is a random variable

We observe that charge/discharge decision Ut is taken at the beginning of [t,t + 1[, before knowing the available braking energy Wt. We are in a decision-hazard setting.

1.1.2 Constraints

We have to ensure the supply/demand equilibrium at each node and each time step on the subway station node which leads to equation:

Dt − U − = Es
      t      t

On the subway line node we have to ensure supply/demand balance however we are not forced to recover all the regenerative braking energy Wt. We have then an inequality constraint (1d):

U+ ≤  El + Wt
 t     t

We prevent the battery manager to sell power to the grid then we have to ensure:


1.2 Criterion: intertemporal payoff

At each time t we pay to the national grid operator the quantity Etl + E ts of energy. We assume that the price of electricity ct per kWh is known in advance as it is negociated and constant during [t,t + 1[.

At each time t we pay the instantaneous cost:

Lt(St,Ut,Wt ) = ct(Elt + Est)

Using equation (2) we can eliminate the variable Ets leading to the instantaneous cost:

                    l          −
Lt(St,Ut,Wt ) = ct(Et + Dt − U t )

At the end of the horizon we have no preference concerning the final state of charge of the battery. We introduce then a final cost function K : 𝕏 such as:

∀ST  ∈ 𝕏, K (ST ) = 0

From 0 to T the payoffs sum up to:

T∑−1     l         −
   ct(E t + Dt − U t ) + K (ST )

The manager wants to minize these payoffs every day without knowing the future availability of braking energy. He will therefore minimize them on average, hopefully obtaining the best mean payoff he can obtain after several days. The manager wants to minimize the expected payoff:

𝔼 [   c (El + D  − U −) + K (S  )]
       t  t     t    t        T

The manager wants to solve problem (1). We detail the last constraint (1i) in the following.

1.3 Numerical data and model

1.3.1 Constants

We define, in file constants.jl, some numerical data required to model the problem.

  dt = 15*60. #time step, seconds
  T0 = 24*3600 #horizon, seconds
  T = floor(Int,T0/dt) #number of stages
  hour_stages = floor(Int, T/24.) #number of stages in 1 hour
  U_p_max = 100 # ramp upper limit for battery charging, kW
  U_m_max = 100 #ramp upper limit for battery discharging, kW
  Cap = 40 #battery maximal capacity, kWh
  X_max = 0.9 * Cap #battery operational upper capacity, kWh
  X_min = 0.3 * Cap #battery operational lower capacity, kWh
  rho = 0.95 #efficiency
  D = 100+5*rand(T-1)-5*rand(T-1) #demand, kWh;
  D = floor(D)
  C = 0.06*ones(T-1)*dt/3600. #cost of electricity, euros/kWh;
  C[7*hour_stages:9*hour_stages] = 0.08*dt/3600.
  C[16*hour_stages:19*hour_stages] = 0.08*dt/3600.
  C[9*hour_stages:16*hour_stages] = 0.07*dt/3600.
  X_0 = 0.35 * Cap #initial battery state of charge, kWh
  U_bounds = [(-U_m_max,U_p_max)] #battery control bounds, kWh
  X_bounds = [(X_min, X_max)] #battery state of charge bounds, kWh

Question 1 Comment each line by defining the constant after the #. What is the number of time steps of the problem? The number of time steps in one hour? Include the file constants.jl in your script or notebook. Plot the cost of electricity as well as the consumption of the station during the day. Compute the cost of the station’s energy consumption over the whole time horizon when it consumes exclusively on the national grid.

1.3.2 Braking energy scenarios

We provide a code function, defined in scenarios_generation.jl , that allows to compute a given number of braking energy scenarios over the time horizon.

  Compute n scenarios of braking energy
  # Arguments
  * `Int`:
      the number of scenarios wanted
  # Return
  * Array{Float64} : array of dim 3 and size (T-1, n, 1)
  containing n scenarios of W_{t} realizations
  * Array{Float64} : array of dim 3 and size (T-1, n, 1)
  containing n scenarios of xi_{t} realizations 
  function generate_braking_scenarios(n)

Question 2 Include scenarios_generation.jl in your script or notebook. Use this function to generate 50 regenerative braking scenarios and save the 2 outputs. The first output contains the braking energy scenarios, we leave the second output until the end. Based on this 50 scenarios what is the average energy available over a day? We will call these 50 scenarios the assessment scenarios thoughout the work. We will refer to the second output as the ξ-assessment scenarios.

1.3.3 Problem functions declaration

We define here 4 code functions required to model the problem.

  function cost(t,x,u,w)
      return C[t]*(D[t]+max(0,u[1]-w[1])+min(u[1],0))
  function constraints(t,x,u,w)
      return (D[t]+min(u[1],0)>=0)
  function dynamics(t, x, u ,w)
      return [x[1] + dt./3600*(rho*max(u[1],0) + (1/rho)*min(u[1],0))]
  function final_cost(x)

Question 3 What is the number of state variables? What is the number of control variables? Verify that the code functions correspond the real functions of the problem.

2 Evaluation with given policies

The manager wants to be able to make a decision that takes into account the future uncertainties but that use the information about the past uncertainties realization. This is the meaning of the last constraint (1i) of problem (1) we did not yet explained.

2.1 Non-anticipativity constraint

The constraint Ut = πt(W0,...,Wt1) states that the decision at time t, Ut, must depend only on the past uncertainties of the problem W0,...,Wt1.

When the Wt are time independent we can restrict the search to functions of the current state of the problem. The non anticipativity constraint becomes Ut = πt(St). When the Wt are not time independent we can still restrict the search to functions of the state. The controls obtained are not guaranteed to be optimal but they may be good enough and are often easier to compute.

An admissible state feedback policy π : 𝕋 × 𝕏 𝕌 assigns a charge or discharge instruction  Ut = πt(St) 𝕌 to any time t 𝕋 and to any state of charge St 𝕏, while respecting the problem constraints. Given an admissible policy π and given a braking energy scenario W() = (W0,,WT1), we are able to build a battery state of charge trajectory S() := (S0,,ST1,ST ) and a charge/discharge control trajectory U() := (U0,,UT1) produced by the “closed-loop” dynamics initialized at the initial time 0 by


We also obtain the payoff associated to the braking energy generation scenario W1,...,WT :

                T∑ −1
V π(S0,W (⋅)) :=    Lt(St,Ut,Wt+1 ) + K (ST),
 0               t=0

where S() and U() are given by (5).

The expected payoff associated with the policy π is

𝔼V 0 (S0, W (⋅)),

where the expectation 𝔼 is taken with respect to the product probability . The true expected value (7) is difficult to compute,1 and we evaluate it by the Monte Carlo method using Ns braking energy scenarios (W1(),,WNs()):

-1-∑ s   π       s
N      V0 (S0,W  (⋅)).
  s s=1

By the law of large numbers, the mean payoff (8) is a “good” approximation of the expected payoff(7) if the number of scenarios is “large enough”.

2.2 Simulation of a given heuristic

We propose here a code function providing a heuristic strategy base only on state and time information. At time t this function return a control ut knowing t and xt. This code function is then a state feedback. Its idea is to charge during peak hours and discharge during off-peak hours while ensuring the state and control constraints.

  function my_heuristic(t, x_t)
      U = 0
      #morning peak hours
      if (t>=7*hour_stages)&&(t<=9*hour_stages)
          U = max(rho*(X_min-X_max)/(2*hour_stages+1),
      #evening peak hours
      elseif (t>=16*hour_stages)&&(t<=19*hour_stages)
          U = max(rho*(X_min-X_max)/(3*hour_stages+1),
      else #off-peak hours
          U = min((X_max-x_t)/rho*3600/dt, U_p_max)

Question 4 What are the peak and off-peak hours? What is the sign of the control returned by this heuristic according to the time at which it is computed? Explain the logic of this heuristic.

We provide a code function, in policy_assessment.jl, to assess a policy along multiple scenarios knowing the instantaneous cost, the final cost, the dynamic, the scenarios, the initial state of the system and the policy.

  Assess a policy by simulation of multiple scenarios to compute the expectation of the cost
  # Arguments
  * `Array{Float64}`:
      initial state of the system
  * `Int`:
      number of stage in the problem
  * `Function`:
      Instantaneous cost function
  * `Function`:
      Final cost function
  * `Function`:
      Dynamics function
  * `Function`:
      Policy, function of the time and state returning a control
  * `Array{Float64}`:
      Noises scenarios along which we want to simulate
  # Return
  * `Float64`:
      Mean cost obtained by simulation of the heuristic over multiple scenarios
  * `Float64`: dim 3 with size (T, dim(X_t), N_scenarios)
      Array containing the state trajectories
  function assess_policy(initial_state, stages_number, inst_cost, fin_cost, dyn,

Question 5 Use this function with the previous heuristic on each on the assessment scenarios. What is the average cost of energy consumption on the national grid obtained? Compare with the origanl cost when there was no battery. Plot the battery state of charge for some scenarios. Does it correspond to your previous understanding of the heuristic?

3 Stochastic Dynamic Programming resolution

Stochastic Dynamic Programming is a method to solve Stochastic Optimal Control problems. It is based on the concept of value functions and the so called Bellman’s principle of optimality. The value function of the problem at time t is defined as:


With K the final cost of the problem which is zero in our case. The principle of optimality states that, when the problem noises are independent, we can write the following recursion between value functions:

Vt(St ) = min 𝔼[Lt(St,Ut,Wt+1 ) + Vt+1 (ft(St,Ut,Wt+1 ))]

In practice we use equation (9) in two phases: an offline stage where we use it to compute value functions, and an online stage where these functions are used to compute a control.

Offline phase: computation of value functions   
For t from T to 0 compute

Vt(St) = min  𝔼[Lt(St,Ut,Wt+1 ) + Vt+1 (ft(St,Ut,Wt+1 ))]

Online: computation of a control   
At time t, knowing the state of the system St compute Ut:

Ut ∈ arg min 𝔼 [Lt(St,U, Wt+1 ) + Vt+1(ft(St,U, Wt+1))]
       U ∈𝕌

In practice we compute the value functions on a discretized state space 𝕏d. As it is not possible to iterate over a continuous set. Similarly we solve problems (19) and (20) by discretizing 𝕌 and testing every control in 𝕌d the discrete space obtained.

3.1 Markovian assumption, state and control discretization

In order to compute the expectation in problems (19) and (20) we assume that the noises are time independent and use 1000 braking energy scenarios to compute the marginal laws of the noises at each time step. We do not use the assessment scenarios in order not to bias the results. The discrete marginal laws are obtained by quantization as Monte Carlo should be avoided as much as possible with Dynamic Programming.

  #we build the marginal laws
  w_laws = build_noises_laws(N_W, 10);

We choose the size of the discretization step for the state variable and the control variable.

  U_steps = [2.] # controls discretization, kW
  X_steps = [1.] # states discretization, kWh

Question 6 What is the number of operations required to compute the value functions of the problem?

3.2 Value functions offline computation

We provide a function to compute the value functions of the problem.

  V_sdp = StochDynamicProgramming.solve_DP(battery_management_sdp_model,
   params_sdp, 1);

Question 7 Plot the value function at time 3 over the state of charge of the battery using the function provided in plotting_functions.jl. Why is it decreasing? Plot the value function at time 4300. Why does it display a plateau?

3.3 Using value functions for online control computation

We provide a function, in file sdp_policies.jl to generate a control at time t knowing the previous uncertainty realization, the state of the system and the time step.

  Obtain a control for the battery management problem at time t 
  knowing state x_t using value functions
  # Arguments
  * `Int`:
      time step t
  * `Array{Float64}`:
      state of the system at time t
  # Return
  * Array{Float64}:
  control computed by the policy
  function get_control_sdp_x(t, x_t)

This function is a policy that can be plugged in our simulator.

Question 8 Simulate battery control using value functions along the assessment scenarios. What is the average cost obtained? Compare with the previous methods. Plot the battery state of charge during the day. Do you notice an important discrepancy with the heuristic simulation?

4 Model Predictive Control

Model Predictive Control is a popular method to solve Stochastic Optimal Control problems arising in many engineering fields (electrical, mechanical...). Its popularity comes from its simplicity and accessibility. It consists in making a forecast to get rid of all the future uncertainties at each time step. We can then solve a deterministic problem which provides a control trajectory. We apply the first control. At the next time step we compute a new forecast and we reoptimize.

This algorithm has no offline phase.

Online: computation of a control   
At time t, knowing the state of the system St and the past uncertainties W0,..,Wt1 compute a forecast wt,..,wT1 and compute Ut:

Ut ∈ arg min    min    [   Lk (Sk,Uk, ¯wk+1) + K (ST)]
       Ut∈𝕌  Ut+1,..,UT−1  k=t

4.1 Deterministic problem resolution

We provide a function to solve a deterministic problem knowing a scenario, the initial state and the initial time step. This function is in the file deterministic.jl.

  Compute the solution of the problem where noises are replaced by nominal values
  # Arguments
  * `Array{Float64}`:
      initial state of the system
  * `Int`:
      initial time step at which we solve the problem
  * `Array{Float64}`:
      Noises scenario to replace uncertainties
  # Return
  * `JuMPmodel`:
  JuMP model modelling the deterministic problem
  * Float64:
  objective value obtained
  * Array{Float64}:
  trajectory of the battery state of charge obtained
  * Array{Float64}:
  trajectory of the control obteined
  function solve_anticipative_problem(X_0_loc, t0, stages_horizon, forecast)

Question 9 Use this function to solve the deterministic problem over the whole horizon with initial state x0 and initial time step 0 for each of the assessment scenarios. Why is the average cost lower than the heuristic and dynamic programming one?

4.2 Model Predictive Control simulation

We provide a function to compute a forecast at time t for the next time steps up to the rolling horizon length knowing only the state at time t.

This function is used in the MPC policy as it consists in making a forecast of the future time steps and solve a deterministic problem.

  function classic_mpc_policy(t, x_t)
      forca = compute_forecast(t, T)
      _, _, _, u = solve_anticipative_problem(x_t[1], t, T-t+1, forca)
      return [u[1,1,1]-u[1,2,1]]

Question 10 Compute a simulation with the mpc policy for each of the assessment scenarios. Compare with the previous results.

5 Taking into account non-Markovian structure

We could improve the Model Predictive Control results. Indeed MPC performances depend significantly on the forecast quality. Here we use only information about the state of the system at time t to compute a forecast assuming all the noises are time independent. We the use the previous marginal laws to compute the forecast.

The scenario generator we provided follows the following rules:

  • there is no braking energy during the night as subways are not circulating then t ∈{Tstop,...,Tstart}, Wt = 0
  • during operating hours we model regenerative braking randomness by an autoregressive process of order 1:
    ∀t ∈ {0,..,T    } ∪ {T   ,..,T}, W   = aW     + ξ
            start     stop         t      t−1    t

    with (ξt)t𝕋 a sequence of gaussian white noises.

5.1 Improving MPC forecast

The forecast computed at each time step in the MPC strategy could be based on this model and on the last uncertainty realization. The MPC policy is not anymore a state feedback it lies in the class of mappings:

πt : 𝕏 × 𝕎 →  𝕌

We provide a MPC policy using this information

  function ar_mpc_policy(t, x_t, w_t)
      forca = compute_ar_forecast(t, w_t, T, 100)
      _, _, _, u = solve_anticipative_problem(x_t[1], t, T-t+1, forca)
      return [u[1,1,1]-u[1,2,1]]

Question 11 Compute a simulation with this mpc policy for each of the assessment scenarios. Compare with the previous results.

We can build such kinds of policies in a dynamic programming framework as well.

5.2 Using past informations online in classical SDP

We consider the value functions computed at section ??. During the online phase we can compute the expectation with respect to Wt by exploiting the stochastic process structure. We know here that W follows an autoregressive process (or zero during night hours). We know then that: 𝔼[g(Wt)|Wt1 = wt1] = 𝔼[g(awt1 + ξt)]. Knowing the law of ξt we can therefore estimate the conditional expectation at time t.

Offline phase: computation of value functions   
Ignore the dependence between the noises. We keep the same marginal laws for the random variables. For t from T to 0 compute

Vt(St) = mUitn∈𝕌 𝔼[Lt(St,Ut,Wt+1 ) + Vt+1 (ft(St,Ut,Wt+1 ))]

Online: computation of a control   
At each time step t, observing state St and previous uncertainty realization wt compute Ut:

Ut ∈ arg min 𝔼 [Lt(St,U, awt−1 + ξt) + Vt+1(ft(St,U, awt−1 + ξt))]
       U ∈𝕌

We provide a policy of this form using the value functions we generated earlier.

  Obtain a control the battery management problem at time t 
  knowing state x_t and noise w_{t-1} using value functions
  # Arguments
  * `Int`:
      time step t
  * `Array{Float64}`:
      state of the system at time t
  * `Array{Float64}`:
      realization of the noises at time t-1
  # Return
  * Array{Float64}:
  control computed by the policy
  function get_control_sdp(t, x_t, w)

Question 12 Compute a simulation with this policy for each of the assessment scenarios. Compare with the previous results.

5.3 State augmentation

Using noises correlation information with value functions computed with time independence assumption mitgh be inefficient in practice.

When the randomness is an order 1 autoregressive process it is possible to improve the stochastic dynamic programming results by augmenting the state with 1 state variable. W follows an AR(1) process. We can then take Wt as a state variable, ξ() being the noises of the problem. We just add the following dynamic in the problem (1):

Wt+1  = aWt  + ξt

W() being now a state variable and ξ() the exogenous noises process. We call the new dynamic Ft such that:

F  (S  ,W ,U  ,ξ) = (S +  ρU + − 1U − ,aW  + ξ )
  t  t   t  t t      t     t    ρ  t     t   t

Offline phase: computation of value functions   
For t from T to 0 compute

Vt(St,Wt ) = min 𝔼[Lt(St,Wt, Ut,ξt) + Vt+1(Ft(St,Wt, Ut,ξt))]

Online: computation of a control   
At time t, knowing the state of the system St compute Ut:

U  ∈ arg min 𝔼[L (S ,W  ,U ,ξ ) + V  (F (S ,W  ,U ,ξ ))]
  t    U∈𝕌      t  t   t  t  t     t+1   t  t  t   t t

We define a new instantaneous cost code function, a new dynamic, a new state discretization, a new control discretization and new noises laws.

  function cost_2_vars(t,x,u,w)
      return C[t]*(D[t]+max(0,u[1]-x[2])+min(u[1],0))
  function dynamics_2_vars(t, x, u ,w)
      return [x[1] + dt./3600*(rho*max(u[1],0) + (1/rho)*min(u[1],0)),
  X_steps_2_vars = [4., 50] # states discretization, kWh
  X_bounds_2_vars = [(X_min, X_max), (0, 400)]
  w_laws_2_vars = build_xi_noises_laws(10)

Question 13 What is the number of state variables? What is the number of control variables? What is the difference between the previous instantaneous cost and this one? Same question with the dynamics.

5.4 Value functions generation and forward simulation

We build another stochastic dynamic programming model. We can generate the value functions of the problem with 2 state variables. Now the noises are truly independent.

  V_sdp_2_vars =
  StochDynamicProgramming.solve_DP(battery_2_vars,params_sdp_2_vars, 1);

We provide a function to compute a control online using these value functions. This is a state feedback with the augmented state. It is therefore a function of St and Wt.

Question 14 Using the ξ assessment scenarios that were generated at the beginning, corresponding to our previous assessment scenarios. Assess the policy. Display the average cost obtained.

5.5 Benchmark

Question 15

Compare all the methods by filling the following table. Discuss the results.

Policy Offline timeOnline timeCosts w/ ref scenario




MPC w/ wt

SDP w/ wt

SDP w/ 2 states

Table 1: Benchmark

L'École des Ponts ParisTech est membre fondateur de

L'École des Ponts ParisTech est certifiée 

ParisTech ParisEst ParisTech


Copyright 2014
École des Ponts ParisTech
All rights reserved