We consider here a house equipped with a thermal hot water tank, a combined heat andpower generator (CHP) and a battery. This house has two systems to satisfy itsenergetical needs: a thermal system for heating, and an electrical system to satisfy itselectrical demand. These two systems are coupled together via the CHP, rendering theoptimal control diﬃcult to ﬁnd.

The CHP is a generator which produces both electricity and heat while burning gas.The power of the CHP can not be modulated: it produces always the same amount ofheat and electricity.

A battery can store the potential surplus of electricity (especially when demand islow), and the house is connected to the grid to import electricy if necessary. The heatproduced by the CHP is stored in a hot water tank. Hot water is withdrawn from thistank to satisfy thermal demand.

We aim to decrease the operational costs while taking into account that:

the electrical price depends upon time, as shown in Figure 2,

electrical and thermal demands are highly variable (the evolution of thesedemands are displayed in Figure 3).

Figure 1: Diﬀerent systems constituting the domestic microgrid.

Figure 2: Electrical price

Figure 3: Diﬀerent scenarios for demand

1.2

1.2 System’s dynamic

We model our system with discrete linear equations, stating the evolution of the systembetween two timesteps t and t + 1. Controls are computed at the begining of the timeinterval [t,t + 1[, and are applied all along this interval. That gives a discrete timeoptimal control problem.

We control here the system during one day, at a 15mn timestep.

1.2.1

1.2.1 Battery dynamic

B_{t}is the energy stored inside the battery at time t. The battery’s dynamic is thefollowing:

(1)

Battery’s constraintsThe battery’s level must remain greater than a given level otherwise its lifeexpectancy decreases much more quickly. Usually, it is speciﬁed that this level remainsbetween 30% and 90% of the maximum level of charge. The charge and dischargeare also bounded by a maximum rampage rate ΔB. These two constraintswrite:

(2)

and

(3)

1.2.2

1.2.2 Hot water tank dynamic

We have two inputs for the hot water tank:

the heat F_{gh,t}= (1 − α)P_{chp,t}produced by the CHP,

the heat F_{A,t}produced by the auxiliary burner,

and one output: the thermal demand of the house D_{t}^{th}.

We denote the tank’s level H_{t}, denoting the energy stored inside the tank at time t(in kWh). This level is bounded:

(4)

The hot water tank is modelled with:

All these equations are designed so as the constraints 4are satisﬁed for all demandD_{t}^{th}.

The recourse variable F_{th,t+1}is penalized afterward in the cost function.

1.3

1.3 Criterion: operational costs

Electrical exchangeWe denote F_{N,t}the electricity imported from the grid at time t.

The balance equation between production and demand in the upper electric part inFigure 1.1writes:

(6)

The electricity imported from the network is being paid at a tariﬀ π_{t}^{el}, whichdepends upon time. The price scenarioπ_{ti}^{el},…,π_{Tf}^{el}is supposed to be known inadvance.

the cost to use the CHP, that is the gas we burn to produce heat andelectricity: π^{gas}P_{chp,t},

the cost to use the auxiliary burner π^{gas}F_{A,t},

the cost to import electricity from the grid π_{t}^{el}F_{N,t},

the virtual cost of uncomfort if thermal demand is unsatisﬁed: π^{th}F_{th,t+1}.

1.4

1.4 Optimization criterion

We can now write the optimization criterion:

(8)

We recall the diﬀerent notations we used previously in Table 1.

State variables

B_{t}

Battery level

kWh

H_{t}

Tank level

kWh

Control variables

P_{chp,t}

CHP power

kW

F_{B,t}

Energy exchanged with the battery

kW

F_{A,t}

Production of auxiliary burner

kW

Noises

D_{t}^{el}

Electrical demand

kW

D_{t}^{th}

Thermal demand

kW

Parameters

π^{el}

Electricity price

euros

π^{gas}

Gas price

euros

π^{th}

Price of uncomfort

euros

Table 1: Notations

2

2 Simulation and evaluation of given policies

The states, controls and perturbations are now denoted:

We suppose that states belong to a static set x_{t}∈ 𝕏, so do control u_{t}∈ 𝕌.

A demand scenarios is a sequence of demands from time t_{0}up to T_{f}− 1.

(10)

where for all t, w_{t}= (D_{t}^{el},D_{t}^{th}).

w(⋅) is supposed to be the realization of a stochastic process (W_{t})_{t=0}^{Tf}.

2.1

2.1 Evaluation of a policy

An admissible policy γ : 𝕋 × 𝕏 → 𝕌 assigns a control u = γ(t,x) ∈ 𝕌 to any time t ∈ 𝕋and to any state x ∈ 𝕏,

Given an admissible policy γ and given a demand scenario w(.) we are able to buildtrajectories x(⋅) for battery and hot water tank: the dynamic is initiate at time t_{0}by:

and the trajectory is computed step by step with following equation:

We obtain the payoﬀ associated to the demand scenario w(⋅):

(11)

The expected payoﬀ associated with the policy γ is:

(12)

where the expectation 𝔼 is taken with respect to the product probability ℙ. We evaluate(12)by the Monte-Carlo method using N_{s}scenariosw^{1}(⋅),,w^{Ns}(.):

(13)

In the next, we will evaluate each algorithm with this method:

We compute trajectories with the help of the function γ given by thealgorithm.

The data used for scenarios generation can be obtained here in zip format code.zipor tgz code.tgz.

Question 1

Copy this code and execute it.

Plot 10 scenarios of electrical demand and 10 scenarios of thermal demand.

2.2

2.2 Simulation of a given policy

The function simulateis used to simulate a policy:

function simulate(scenarios, policy) nassess = size(scenarios, 2) ntime = size(scenarios, 1) x0 = [.6, 20.] stocks = zeros(nassess, ntime, 2) controls = zeros(nassess, ntime-1, 3) costs = zeros(nassess) # Set first value of stocks equal to x0: for i in 1:nassess stocks[i, 1, :] = x0 end for t=1:ntime-1 for k = 1:nassess state_t = stocks[t, k, :] aleas = scenarios[t, k, :] uc = policy_mpc(t, state_t) xf = dynamic(t, state_t, uc, aleas) stocks[k, t+1, :] = xf controls[k, t, :] = uc costs[k] += cost(t, state_t, uc, aleas) end end return costs, stocks, controls end

Question 2

What is the inputs of the function simulate? And its outputs?

Relate this function with the theoritical model above.

We consider a ﬁrst policy, given by the echelon-based heuristic:

If the tank’s level is less than a given critical level H^{c}, then switch on theCHP.

Once the tank is ﬁlled, switch oﬀ the CHP.

Question 3

Implement this heuristic. The function should take as input the time t andthe current state x_{t}and return a control u_{t}.

Apply this policy along the scenarios generated in Question 1.

Show the costs’ distribution. Plot 10 trajectories for states and controls.

3

3 Solving by two diﬀerent methods

We are going to compare two policies to solve the problem(8):

a policy produced by the so called Model Predictive Control (MPC),

a policy based upon Dynamic Programming (DP).

We will assess each policy along the scenarios generated in Question 1. However,these policies need to consider some scenarios as input for their algorithms. To be fair,we will use optimization scenarios distinct from the assessment scenarios considered inQuestion 1.

As we are considering a real world problem, we do not have that much scenariosrecorded. That is why we use only 30 scenarios for optimization and 30 scenarios forassessment.

3.1

3.1 Model Predictive Control

As time goes on, MPC computes at the beginning of each periodthecontrol as a function of the current time and state x_{τ}. Here are the details:

Derive a deterministic scenarioof demands(forecast) with the optimization scenarios.

Solve the deterministic optimization problem

(14)

Apply only the ﬁrst control u_{τ}^{♯
}at time τ, and iterate at time τ + 1

The problem(14)returns a control u_{t}^{♯
}for all time t and state x_{t}. This problem givesthe MPC policy:

Apply MPC policy along the scenarios generated in Question 1.

Show the costs’ distribution. Plot 10 trajectories for states and controls.

Does the trajectory of the battery’s state of charge seem reasonable?

3.2

3.2 Stochastic Dynamic Programming

Dynamic programming works in two stages: an oﬄine stage where value functions arecomputed, and an online stage where these functions are used to compute apolicy.

Oﬄine: computation of value functions

Use optimization scenarios to estimate expectation

Use these marginal distributions to compute so-called value functions, givenby

Online: construction of the policyDP policy computes at the beginning of each periodthe control as afunction of the current time τ and state x_{τ}:

Solve the optimization problem with the help of the previously computedvalue functions (V_{t})_{t=0}^{Tf}.

Apply control u_{τ}^{♯
}at time τ, and iterate at time τ + 1.

x_{t+1}= f_{t}(x_{t},u_{t},w_{t}) is a functionnal relation, lying in the continuous state space 𝕏. As weuse a computer we will discretize state and control spaces to implement dynamicprogramming. We restrict ourself to a subset 𝕏^{d}⊂ 𝕏 for states and 𝕌^{d}⊂ 𝕌 forcontrols.

ImplementationThe Dynamic Programming solver is already implemented. It is called by thefunction:

Vs = solve_DP(sdpmodel, paramSDP, 1)

which return the Bellman functions in a multidimensional array Vs.

Question 5Generate marginal laws with the help of the given optimizationscenarios and the function generate_laws.

Question 6

Compute Bellman functions with the function solve_DP.

Apply Dynamic Programming policy along scenarios generated in Question1.

Show the costs’ distribution. Plot 10 trajectories for states and controls.

Question 7Plot diﬀerent Bellman functions with the help of the functionplot_bellman_functions. Comment the evolution of the shape of these functionsalong time.

Question 8

Find the discretization used for states and controls. What is the cardinal ofthe set 𝕏^{d}(approximation of the space 𝕏)? And those of the set 𝕌^{d}?

What happens if we reﬁne the discretization of tank and battery by 10?

3.3

3.3 Comparison between diﬀerent policies

We want to compare three diﬀerent features for each algorithm:

the oﬄine computation time,

the online computation time,

the average costs obtained upon the assessment scenarios.

Policy

Oﬄine computation time

Online computation time

Average Costs

Heuristic

MPC

DP

Table 2: Benchmark

Question 9Fill the Table 2with the results previously obtained.

A

A Discretization and interpolation schemes

We have two stocks: a battery and thermal tank. Even if the dynamic programmingprinciple still holds, the numerical implementation is not as straightforward as in theunidimensional case.

A.1

A.1 Discretization

We need to discretize state and control spaces to apply dynamic programming. Werestrict ourself to a subset 𝕏^{d}⊂ 𝕏 for states and 𝕌^{d}⊂ 𝕌 for controls. The policy ﬁnd bydynamic programming is then suboptimal.

To iterate upon diﬀerent states in 𝕏^{d}, we deﬁne a bijective mapping between 𝕏^{d}andℕ. This mapping is written:

(16)

Question 10

Find a mapping that satisfy to the conditions previously stated.

Implement it.

A.2

A.2 Interpolation

With this discretization, we know the value of Bellman function V only upon the grid𝕏^{d}, and not upon the whole state 𝕏.

We recall that the next state is given by the state equation: However, even if x_{t}∈ 𝕏^{d}and u_{t}∈ 𝕌^{d}, we can not ensure that x_{t+1}∈ 𝕏^{d}to compute future cost V_{t+1}(x_{t+1}). Thatis why we need to use an interpolation scheme to interpolate V_{t+1}(x_{t+1}) with the valuesof V_{t+1}in 𝕏^{d}.

Figure 4: x_{t+1} could be outside the grid deﬁned by 𝕏^{d}

Question 11With the notation introduced in Figure 4, write the linearinterpolation of x_{t+1}w.r.t X_{i}^{d}.