fmpds_prosumer

1/26
Factored MDPs for Optimal Prosumer
Decision-Making
Angelos Angelidakis
aggelos@intelligence.tuc.gr
Georgios Chalkiadakis
gehalk@intelligence.tuc.gr
School of Electronic and Computer Engineering
Technical University of Crete
Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making

2/26
Outline
1 Introduction
2 Background
3 Our Model
4 Solving the Factored MDP
5 Prosumer Production and Consumption Models
6 Experiments and Results

3/26
Prosumer
Produces and consumes energy
Single residence, an industry, a neighbourhood
Connected to the electric Grid (or not)
Key role to stabilization of the electricity network

4/26
What we do in this paper
Focus on micro-grid prosumers:
– Encompassing, e.g., wind–turbine–generators (WTG),
photovoltaic systems (PVS), batteries and household
neighbourhoods
Optimize prosumer operation decisions:
– buy and sell energy from/to utility companies
– store energy
– select electricity tariffs to subscribe to
while ensuring consumer needs are satisﬁed

5/26
Key concepts and contributions
A complete framework for microgrid–prosumer decision making:
A Factored Markov Decision Process to model the
prosumer decision problem
– 24 hours ahead
Exact optimal solution, works for a microgrid of any size
Consumption and production-predicting submodels
Test on a real–world dataset
Comparison with SPUDD
– a robust method for stochastic planning in large
environments

6/26
Outline
1 Introduction
2 Background
3 Our Model

7/26
Stochastic Planning Using Decision Diagrams
(SPUDD)
ﬁnds (near-)optimal policies in very large problems
combines value iteration with algebraic decision diagrams
In our problem, SPUDD:
produces policies that coincide with ours
but cannot solve the problem in the required 24-hours
– operates over an input script which can grow large

8/26
Outline
1 Introduction
2 Background
3 Our Model
FMDPs
Factored Representation
Physical Constraints
Transition Function
Factored Reward Representation

9/26
Factored Markov Decision Process (FMDPs)
A compact alternative to standard MDP representation
Set of states correspond to multivariate random variables,
s = si , with the si ∈ DOM(si)
Reward functions used are assumed to be factored into
specific components
FMDP allow for external signals affecting state variables
Various solution methods exist1, e.g.:
– linear value functions
– approximate linear programming
– SPUDD
1
– [Guestrin, Carlos, et al. "Efficient solution algorithms for factored MDPs." Journal of Artificial Intelligence Research
2003]
– [Hoey, Jesse, et al. "SPUDD: Stochastic planning using decision diagrams." Proceedings of the Fifteenth
conference on Uncertainty in artificial intelligence 1999]

10/26
A Factored Representation of our model
States
Hour-of-Day, DOM(tms): {1 . . . 24}
Energy stored on batteries, DOM(bat): {0 . . . Batterymax}
Tariff prosumer has subscribed into, DOM(tf): {tf1, · · · , tfK}
Actions
buy energy, DOM(buy):{−RESnom . . . Loadmax}
charge batteries, DOM(chg):{−Batterymax . . . Batterymax}
select tariff by the prosumer, DOM(seltf):{0 . . . K}
External Signals
available price tariffs
- buying–selling prices provided by multiple utility companies,
for each hour of the day
predicted production, DOM(prod):{0 . . . RESnom}
predicted consumption, DOM(cons):{0 . . . Loadmax}

11/26
Physical Constraints
electricity energy balance must be maintained
prodt − const − chgt + buyt = 0
storage unit cannot be charged over its capacity
chgt ≤ Batterymax − batt
energy quantity discharged cannot exceed current quantity
stored:
−chgt ≤ batt
the state of charge must be 20% to 100% 2:
0.2 ≤
batt
Batterymax
≤ 1
2
– [Chiasson, John, and Baskar Vairamohan. "Estimating the state of charge of a battery." IEEE Transactions on
Control Systems Technology 2005]

12/26
Transition Function
stochastic state transitions in our model:
– successful charge (store c) with probability p:
Pr(batt+1 = batt + c | chgt = c, batt) = p
– unsuccessful charge (store c) with probability 1 − p:
Pr(batt+1 = bat ∈ boundbat | chgt = c, batt) = (1 − p)/N
– while tariff is affected by tariff selection action:
- seltf1
. . . seltfK
Overall transition probability:
Pr(tmst+1, batt+1, tft+1|tmst, batt, tft, chgt, seltf,t) =
Pr(batt+1|batt, chgt) · Pr(tft+1|tft, seltf,t)

13/26
Factored Reward Representation
Our rewards correspond to costs:
Cost(st, at, st+1) = Cenergy + Cperiod + Cbl
Cenergy, cost per Wh for buying electricity
Cperiodic, periodic subscription cost of the tariff
Cbl, cost associated with battery life losses

14/26
Cenergy cost per Wh for buying–selling electricity:
Cenergy(tft+1, buyt) =



buyt · buyingtft+1
if buyt ≥ 0
buyt · sellingtft+1
if buyt < 0
Cperiodic cost of tariff
Cperiod(tft+1, pricet+1
tf ) =
C1 exp{−C2 · (buyingt+1
tf −
sellingt+1
tf )}
3
−0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25
0.005
0.01
0.015
0.02
0.025
buying price − selling price
periodiccost
periodic cost
3
http://www.eia.gov/state/search/#?1=102&3=21&a=true&2=211

15/26
Cbl, costs associated with battery life losses:
Cbl = Lloss · Cinit−bat
with Cinit−bat initial investment cost for the batteries:
Lloss =
Ac
Atotal
with Ac the battery effective throughput and Atotal the total
cumulative throughput 4
4
A battery size of Q Ah will deliver an effective Atotal = 390 · QAh over its lifetime

16/26
Ac is then expressed as:
Ac = λsocAc
where λsoc is an effective weighting factor:
λsoc = k · SOC + d
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
Time
kWh
empirical datapoints (soc,λ
soc
)
fitted line λ
soc
= k soc + d state of charge of the battery:
SOC =
batt
Batterymax
actual throughput:
Ac =
chgt
Vbattery

17/26
Outline
1 Introduction
2 Background
3 Our Model

18/26
Solving the Factored MDP
for all instantiations of s do
set VT+1(s) = 0
end
for all time-steps t in descending
order
(i.e., with 1, · · · , T stages-to-go)
do
for all instantiations of st do
Vt(st) ← max
at
st+1
Pr(st+1 |at, st)·
R(st, at, st+1) + Vt+1(st+1)
end
end
for all instantiations of s and all
time-steps t do
π(s, t) =
arg max
a s
Pr(s |a, s) (R(s, a, s ) + Vt+1(s ))
end
Value Iteration
operating on a
ﬁnite–horizon
problem
provides the
optimal solution
for a prosumer of
any size
within the
required time

19/26
Outline
1 Introduction
2 Background
3 Our Model
Production Prediction
Consumption Prediction

20/26
Production Prediction
RENES: a web-based PVS and WTG production prediction
tool
employs free-of-charge weather forecasts
Developed in our lab
5
5
http://www.intelligence.tuc.gr/renes/

21/26
Consumption prediction for real households data
Polynomial Degree MSE
1 0.022372
2 0.021312
3 0.020175
4 0.017679
5 0.016861
6 0.017329
7 0.017355
8 0.017167
9 0.017399
10 0.017611
MSE of Bayesian linear regression Φ functions
Polynomial Degree MSE
GP with polynomial kernel
(GP-poly)
0.0173
GP with Gaussian kernel
(GP-G)
0.006943
Bayesian linear Regres-
sion (BLR)
0.0169
MSE of GP & Bayesian Linear Regression
0 5 10 15 20 25
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Time
kWh
variance of trained area
(x,y)
(xtrain
,ytrain
)
(x
test
,y
test
)
GP−poly
GP−G
BLR

22/26
Outline
1 Introduction
2 Background
3 Our Model

23/26
Experiments and Results
30 households of New
Hampshire
20 PV modules with
nominal power 60kW per
module
2 windturbines with nominal
power 1000kW each
24 deep cycle 12Volts
batteries 212AH C20 /
FMD200 – VRLA/AGM,
with cost e269,00 each,
Battery lifetime: 10-12
years
0 5 10 15 20 25 30
0
100
200
300
400
500
600
700
800
900
RES−Load
Time
kWh
Load
RES

24/26
Actions – States
battery capacity
bat = [0kWh : 1kWh : 60kWh]
charge action
chg = [−60kWh : 1kWh : 60kWh]
tariffs
Tariff Buy Sell
1 0.1 0.1
2 0.1 0.2
3 0.1 0.3
4 0.2 0.1
5 0.2 0.2
6 0.2 0.3
7 0.4 0.1
8 0.3 0.2
9 0.3 0.3
transition boundaries
– boundarybat=1kWh
– boundarytf=0.1e
– maximum number of transitions are ∼ 15

25/26
Results VI–SPUDD
Both SPUDD and our method compute the same (optimal)
policies. . .
However. . .
Results
Horizon |S × A| bounded region size
Our method
(hours)
SPUDD (hours)
Script Genera-
tion
Execution
Time
Total
Time
24
664290
15 1.76 13.4992 0.184 13.6832
90 15.84 46.9188 1.19 48.1088
2624490 15 8.7603 36.98 0.73975 37.71975
48 664290 15 3.5 16.8221 0.4271 17.2492

26/26
Wrapping–Up
A complete framework for optimal microgrid-prosumer
decision-making
Simple yet effective solution method
Tested on a real-world dataset
Vastly outperforms a known stochastic model (SPUDD) in
terms of solution computation time
In progress: test alternative methods6 and develop novel
techniques for tackling large scale problems
6
– [Munos, Remi, and Csaba Szepesvari. "Finite-time bounds for ﬁtted value iteration." The Journal of Machine
Learning Research 2008]
2003]

26/26
Wrapping–Up
A complete framework for optimal microgrid-prosumer
decision-making
Simple yet effective solution method
Tested on a real-world dataset
Vastly outperforms a known stochastic model (SPUDD) in
terms of solution computation time
In progress: test alternative methods6 and develop novel
techniques for tackling large scale problems
Thank you, any questions?
6
– [Munos, Remi, and Csaba Szepesvari. "Finite-time bounds for ﬁtted value iteration." The Journal of Machine
Learning Research 2008]
2003]

fmpds_prosumer

Recommended

Recommended

More Related Content

Similar to fmpds_prosumer

Similar to fmpds_prosumer (20)

fmpds_prosumer