The goal of the MonetDB/DataCell project is to exploit the power of Relational DBMS (RDBMS) for efficient processing of continues queries over streaming data. This presentation first identifies the essential differences between processing one-time queries and continues queries. It then presents the current archtecture of MonetDB/DataCell and some ideas of how to extend an existing RDBMS with just a handful of new components to handle continues queries.
The presentation was presented by Ying Zhang (Centrum Wiskunde & Informatica) at the PlanetData project Meeting on February 28 - March 4, 2011 in Innsbruck, Austria.
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Exploiting Relational Databases for Efficient Stream Processing
1. MonetDB/DataCell
Exploiting the Power of Relational
Databases for Efficient Stream
Processing
CWI
Project Meeting@Innsbruck
Feb 28 - Mar 04, 2011
Wednesday, March 02, 2011
2. DBMS versus DSMS
1
2
One-time query
Incoming data
DB
answer
4
1 Store incoming tuples
2 Submit one-time query 3
3 Query processing on the already stored data
4 Create answer Disk storage
Wednesday, March 02, 2011
3. DBMS versus DSMS
1
2
One-time query
Incoming data
DB
answer
4
1 Store incoming tuples
2 Submit one-time query 3
3 Query processing on the already stored data
4 Create answer Disk storage
4 3
2
Input stream
Continuous queries
notification 1
Memory
1 Submit continuous queries
2 Incoming streams
A data stream is a never
3 Input stream is processed on the fly ending sequence of tuples
4 The produced results are continuously delivered to the clients
Wednesday, March 02, 2011
4. One-time Queries versus Continuous Queries
arrival time of q
One-time Continuous
query query
t of data
tn t n+1
One-time query
q Evaluated once over the already stored tuples
Continuous query
q Waits for future incoming tuples
q Evaluated continuously as new tuples arrive
Wednesday, March 02, 2011
5. One-time Queries versus Continuous Queries
arrival time of q
One-time Continuous
query query
t of data
tn t n+1
One-time query
q Evaluated once over the already stored tuples
Continuous query
q Waits for future incoming tuples
q Evaluated continuously as new tuples arrive
Wednesday, March 02, 2011
6. One-time Queries versus Continuous Queries
arrival time of q
One-time Continuous
query query
t of data
tn t n+1
One-time query
q Evaluated once over the already stored tuples
Continuous query
q Waits for future incoming tuples
q Evaluated continuously as new tuples arrive
Wednesday, March 02, 2011
7. One-time Queries versus Continuous Queries
arrival time of q
One-time Continuous
query query
t of data
tn t n+1
One-time query
q Evaluated once over the already stored tuples
Continuous query
q Waits for future incoming tuples
q Evaluated continuously as new tuples arrive
Wednesday, March 02, 2011
8. One-time Queries versus Continuous Queries
arrival time of q
One-time Continuous
query query
t of data
tn t n+1
One-time query
q Evaluated once over the already stored tuples
Continuous query
q Waits for future incoming tuples
www
q Evaluated continuously as new tuples arrive
Wednesday, March 02, 2011
9. Observation
• Nowadays stream systems are built from scratch
• Redesign operators and optimizations
• Relational Databases are considered inefficient and too complex
• Modern stream applications require both management of
stored and streaming data
Wednesday, March 02, 2011
10. Goals
• We design the DataCell on top of an existing DataBase Kernel
• Exploit database techniques, query optimization and operators
• Provide full language functionalities (SQL’03)
• Research questions
• is it viable?
• multi-query processing/scheduling
• real-time processing
Wednesday, March 02, 2011
11. The Basic Idea of DataCell
• Stream tuples are first stored in (appended to) baskets.
• We evaluate the continuous queries over the baskets.
Instead of throwing each incoming tuple against the waiting queries (Data Streams)
tuple
Query
Set
first collect the data and then throw the queries against the tuples (DataBase)
tuple Query
Set
• Once a tuple is seen, it is dropped from its basket.
Wednesday, March 02, 2011
12. The MonetDB/DataCell stack
SQL Query
SQL
Query parser
Query Optimizer
MAL
MAL Interpreter
Query Executor
Wednesday, March 02, 2011
13. The MonetDB/DataCell stack
SQL Query
SQL
Query parser + CQ
Query Optimizer + DC opt
Continuous Query Scheduler
MAL
MAL Interpreter
Query Executor
Wednesday, March 02, 2011
14. DataCell Components
Receptor <=> Listens to a stream
Emitter <=> Delivers events to the clients
Factory <=> Continuous query
Basket <=> Holds events
Input Stream Output Stream
R Q E
Wednesday, March 02, 2011
15. DataCell Architecture
SQL Compiler
Data Columns MAL Optimizer
DataCell
R1 id a
a E1
id c Continuous Query Scheduler
id b id a’
id k’
R2 id k
E2
id b’
R3
E3
id k’’
id m
Legend id n id n’
Basket
Receptor
Disk Storage
Emitter
Factory
Wednesday, March 02, 2011
16. DataCell Architecture
SQL Compiler
Data Columns MAL Optimizer
DataCell
R1 id a
a E1
id c Continuous Query Scheduler
id b id a’
id k’
R2 id k
E2
id b’
R3
E3
id k’’
id m
Legend id n id n’
Basket
Receptor
Disk Storage
Emitter
Factory
Wednesday, March 02, 2011
17. DataCell Architecture
SQL Compiler
Data Columns MAL Optimizer
DataCell
R1 id a
a E1
id c Continuous Query Scheduler
id b id a’
id k’
R2 id k
E2
id b’
R3
E3
id k’’
id m
Legend id n id n’
Basket
Receptor
Disk Storage
Emitter
Factory
Wednesday, March 02, 2011
18. DataCell Architecture
SQL Compiler
Data Columns MAL Optimizer
DataCell
R1 id a
a E1
id c Continuous Query Scheduler
id b id a’
id k’
R2 id k
E2
id b’
R3
E3
id k’’
id m
Legend id n id n’
Basket
Receptor
Disk Storage
Emitter
Factory
Wednesday, March 02, 2011
19. DataCell Architecture
SQL Compiler SPARQL Compiler
Data Columns MAL Optimizer
DataCell
R1 id a
a E1
id c Continuous Query Scheduler
id b id a’
id k’
R2 id k
E2
id b’
R3
E3
id k’’
id m
Legend id n id n’
Basket
Receptor
Disk Storage
Emitter
Factory
Wednesday, March 02, 2011
20. Basket Expressions
q Syntax:
It is an SQL sub-query surrounded by square brackets
q Semantics:
All qualifying tuples in a basket expression are removed by the factories
Tumbling window
Q1: Select * From [Select * from X top 3] as S where S.a>10;
Sliding window
Q2: SELECT * FROM (
[Select * From X top 1]
Union
Select * From X top 2 offset 1) as S
WHERE S.a>10;
q Flexible/expressive continuous queries, by selectively picking the data to
process from a basket
q Allow to process predicate windows on a stream.
q out of order processing
Wednesday, March 02, 2011
21. Basket Expressions
q Syntax:
It is an SQL sub-query surrounded by square brackets
q Semantics:
All qualifying tuples in a basket expression are removed by the factories
12
Tumbling window 3
Q1
100
Q1: Select * From [Select * from X top 3] as S where S.a>10;
14
Sliding window
Q2: SELECT * FROM (
[Select * From X top 1]
Union
Select * From X top 2 offset 1) as S
WHERE S.a>10;
q Flexible/expressive continuous queries, by selectively picking the data to
process from a basket
q Allow to process predicate windows on a stream.
q out of order processing
Wednesday, March 02, 2011
22. Basket Expressions
q Syntax:
It is an SQL sub-query surrounded by square brackets
q Semantics:
All qualifying tuples in a basket expression are removed by the factories
12
Tumbling window 3
Q1
100
Q1: Select * From [Select * from X top 3] as S where S.a>10;
14
Sliding window
Q2: SELECT * FROM (
[Select * From X top 1]
Union
Select * From X top 2 offset 1) as S
WHERE S.a>10;
q Flexible/expressive continuous queries, by selectively picking the data to
process from a basket
q Allow to process predicate windows on a stream.
q out of order processing
Wednesday, March 02, 2011
23. Basket Expressions
q Syntax:
It is an SQL sub-query surrounded by square brackets
q Semantics:
All qualifying tuples in a basket expression are removed by the factories
12
Tumbling window 3
Q1
12
100 100
Q1: Select * From [Select * from X top 3] as S where S.a>10;
14
Sliding window
Q2: SELECT * FROM (
[Select * From X top 1]
Union
Select * From X top 2 offset 1) as S
WHERE S.a>10;
q Flexible/expressive continuous queries, by selectively picking the data to
process from a basket
q Allow to process predicate windows on a stream.
q out of order processing
Wednesday, March 02, 2011
24. Basket Expressions
q Syntax:
It is an SQL sub-query surrounded by square brackets
q Semantics:
All qualifying tuples in a basket expression are removed by the factories
12
Tumbling window 3
Q1
12
100 100
Q1: Select * From [Select * from X top 3] as S where S.a>10;
14
Sliding window
Q2: SELECT * FROM (
12
[Select * From X top 1] 3
Union Q2
100
Select * From X top 2 offset 1) as S
14
WHERE S.a>10;
q Flexible/expressive continuous queries, by selectively picking the data to
process from a basket
q Allow to process predicate windows on a stream.
q out of order processing
Wednesday, March 02, 2011
25. Basket Expressions
q Syntax:
It is an SQL sub-query surrounded by square brackets
q Semantics:
All qualifying tuples in a basket expression are removed by the factories
12
Tumbling window 3
Q1
12
100 100
Q1: Select * From [Select * from X top 3] as S where S.a>10;
14
Sliding window
Q2: SELECT * FROM (
12
[Select * From X top 1] 3 12
Union Q2
100 100
Select * From X top 2 offset 1) as S
14
WHERE S.a>10;
q Flexible/expressive continuous queries, by selectively picking the data to
process from a basket
q Allow to process predicate windows on a stream.
q out of order processing
Wednesday, March 02, 2011
26. Basket Expressions
q Syntax:
It is an SQL sub-query surrounded by square brackets
q Semantics:
All qualifying tuples in a basket expression are removed by the factories
12
Tumbling window 3
Q1
12
100 100
Q1: Select * From [Select * from X top 3] as S where S.a>10;
14
Sliding window
Q2: SELECT * FROM (
12
[Select * From X top 1] 3 12
Union Q2
100 100
Select * From X top 2 offset 1) as S
14
WHERE S.a>10;
q Flexible/expressive continuous queries, by selectively picking the data to
process from a basket
q Allow to process predicate windows on a stream.
q out of order processing
Wednesday, March 02, 2011
27. Query processing strategies
Separate Baskets
• Each continuous query is encapsulated within a single factory
• Each factory f has it own input baskets, that are accessed only by f
• If more than one factory are interested for the same data, we create
multiple copies of this data
• Factories are completely independent
• Exploit column-store to minimize the overhead of replication
bcopy1
Q1
b bcopy2
Qcopy Q2
bcopy3
Q3
Wednesday, March 02, 2011
28. Query processing strategies
Shared Baskets
• Exploit query similarities to avoid replication
• Baskets are shared among factories
• Two new (cheap) factories Locker, Unlocker
Q1
b
Q2
Q3
Wednesday, March 02, 2011
29. Query processing strategies
Shared Baskets
• Exploit query similarities to avoid replication
• Baskets are shared among factories
• Two new (cheap) factories Locker, Unlocker
FL1 Q1
b
Lock FL2 Q2
FL3 Q3
Wednesday, March 02, 2011
30. Query processing strategies
Shared Baskets
• Exploit query similarities to avoid replication
• Baskets are shared among factories
• Two new (cheap) factories Locker, Unlocker
FL1 Q1 FU1
b
Lock FL2 Q2 FU2
FL3 Q3 FU3
Wednesday, March 02, 2011
31. Query processing strategies
Shared Baskets
• Exploit query similarities to avoid replication
• Baskets are shared among factories
• Two new (cheap) factories Locker, Unlocker
FL1 Q1 FU1
b
Lock FL2 Q2 FU2 Unlock
FL3 Q3 FU3
Wednesday, March 02, 2011
32. Query processing strategies
Shared Baskets
• Exploit query similarities to avoid replication
• Baskets are shared among factories
• Two new (cheap) factories Locker, Unlocker
FL1 Q1 FU1
b
Lock FL2 Q2 FU2 Unlock
FL3 Q3 FU3
Wednesday, March 02, 2011
33. Summary
+ = DataCell
Wednesday, March 02, 2011