SlideShare a Scribd company logo
1 of 33
MonetDB/DataCell

                   Exploiting the Power of Relational
                     Databases for Efficient Stream
                               Processing

                                        CWI
                             Project Meeting@Innsbruck
                               Feb 28 - Mar 04, 2011




Wednesday, March 02, 2011
DBMS versus DSMS
                                                                            1

                                                         2
                                        One-time query
                                                                                Incoming data

                                                                 DB
                                                answer
                                            4
   1    Store incoming tuples
   2    Submit one-time query                                3

   3    Query processing on the already stored data
   4    Create answer                                                 Disk storage




Wednesday, March 02, 2011
DBMS versus DSMS
                                                                                        1

                                                             2
                                          One-time query
                                                                                              Incoming data

                                                                         DB
                                                   answer
                                               4
   1    Store incoming tuples
   2    Submit one-time query                                     3

   3    Query processing on the already stored data
   4    Create answer                                                             Disk storage


                                      4                      3
                                                                                                  2



                                                                                                     Input stream
                                                      Continuous queries
                                    notification                              1
                                                                                            Memory
   1    Submit continuous queries
   2    Incoming streams
                                                                                    A data stream is a never
   3    Input stream is processed on the fly                                        ending sequence of tuples
   4    The produced results are continuously delivered to the clients

Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
One-time Queries versus Continuous Queries
                                         arrival time of q

                              One-time                       Continuous
                               query                           query




                                                                          t of data
                                             tn          t n+1


              One-time query
               q Evaluated once over the already stored tuples



               Continuous query

                q Waits for future incoming tuples
                                                                          www
                q Evaluated continuously as new tuples arrive



Wednesday, March 02, 2011
Observation
   • Nowadays stream systems are built from scratch

   • Redesign operators and optimizations

  • Relational Databases are considered inefficient and too complex

   • Modern stream applications require both management of
      stored and streaming data




Wednesday, March 02, 2011
Goals
   • We design the DataCell on top of an existing DataBase Kernel

   • Exploit database techniques, query optimization and operators

   • Provide full language functionalities (SQL’03)

   • Research questions
      • is it viable?
      • multi-query processing/scheduling
      • real-time processing



Wednesday, March 02, 2011
The Basic Idea of DataCell
      • Stream tuples are first stored in (appended to) baskets.

      • We evaluate the continuous queries over the baskets.
             Instead of throwing each incoming tuple against the waiting queries (Data Streams)
                              tuple

                                      Query
                                       Set



             first collect the data and then throw the queries against the tuples (DataBase)

                            tuple      Query
                                        Set



      • Once a tuple is seen, it is dropped from its basket.


Wednesday, March 02, 2011
The MonetDB/DataCell stack
                                    SQL Query

                              SQL



                              Query parser



                            Query Optimizer




                             MAL


                             MAL Interpreter


                                    Query Executor




Wednesday, March 02, 2011
The MonetDB/DataCell stack
                                        SQL Query

                                  SQL



                                   Query parser + CQ



                                Query Optimizer + DC opt


                            Continuous Query Scheduler

                                  MAL


                                 MAL Interpreter


                                        Query Executor




Wednesday, March 02, 2011
DataCell Components
                            Receptor   <=>   Listens to a stream


                            Emitter    <=>   Delivers events to the clients


                            Factory    <=>   Continuous query


                            Basket     <=>   Holds events


        Input Stream                                          Output Stream
                                R            Q            E


Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                       id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
DataCell Architecture
                                                  SQL Compiler        SPARQL Compiler


                                 Data Columns             MAL Optimizer
                                                                                 DataCell
                            R1    id a
                                     a                                                            E1
                                           id c     Continuous Query Scheduler
                                    id b                                          id a’


                                                                                          id k’




                            R2    id k
                                                                                                  E2
                                                                                          id b’




                            R3
                                                                                                  E3
                                                                                   id k’’
                                    id m

 Legend                                    id n                                         id n’


        Basket

        Receptor
                                                       Disk Storage
        Emitter
        Factory
Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories

           Tumbling window
           Q1: Select * From [Select * from X top 3] as S where S.a>10;

           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                            100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                            100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                    [Select * From X top 1]
                     Union
                     Select * From X top 2 offset 1) as S
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                                                                            12
                    [Select * From X top 1]                                 3
                     Union                                                          Q2
                                                                            100
                     Select * From X top 2 offset 1) as S
                                                                            14
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                                                                            12
                    [Select * From X top 1]                                 3            12
                     Union                                                          Q2
                                                                            100          100
                     Select * From X top 2 offset 1) as S
                                                                            14
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Basket Expressions
      q Syntax:
             It is an SQL sub-query surrounded by square brackets

      q Semantics:
            All qualifying tuples in a basket expression are removed by the factories
                                                                            12
           Tumbling window                                                  3
                                                                                    Q1
                                                                                         12
                                                                            100          100
           Q1: Select * From [Select * from X top 3] as S where S.a>10;
                                                                            14


           Sliding window
           Q2:      SELECT * FROM (
                                                                            12
                    [Select * From X top 1]                                 3            12
                     Union                                                          Q2
                                                                            100          100
                     Select * From X top 2 offset 1) as S
                                                                            14
                     WHERE S.a>10;

      q Flexible/expressive continuous queries, by selectively picking the data to
         process from a basket

      q Allow to process predicate windows on a stream.
         q out of order processing


Wednesday, March 02, 2011
Query processing strategies
            Separate Baskets

     • Each continuous query is encapsulated within a single factory
     • Each factory f has it own input baskets, that are accessed only by f
     • If more than one factory are interested for the same data, we create
          multiple copies of this data

     • Factories are completely independent
     • Exploit column-store to minimize the overhead of replication
                                          bcopy1
                                                   Q1

                            b             bcopy2
                                  Qcopy            Q2


                                          bcopy3
                                                   Q3

Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                        Q1

                    b

                                        Q2




                                        Q3




Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1

                    b

                            Lock   FL2   Q2




                                   FL3   Q3




Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1     FU1
                    b

                            Lock   FL2   Q2     FU2



                                   FL3   Q3     FU3


Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1     FU1
                    b

                            Lock   FL2   Q2     FU2       Unlock




                                   FL3   Q3     FU3


Wednesday, March 02, 2011
Query processing strategies
          Shared Baskets

      • Exploit query similarities to avoid replication
      • Baskets are shared among factories
      • Two new (cheap) factories Locker, Unlocker

                                   FL1   Q1     FU1
                    b

                            Lock   FL2   Q2     FU2       Unlock




                                   FL3   Q3     FU3


Wednesday, March 02, 2011
Summary




                            +   =   DataCell




Wednesday, March 02, 2011

More Related Content

More from PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoPlanetData Network of Excellence
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksPlanetData Network of Excellence
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingPlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamPlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingPlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...PlanetData Network of Excellence
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchPlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSPlanetData Network of Excellence
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReducePlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...PlanetData Network of Excellence
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsPlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...PlanetData Network of Excellence
 

More from PlanetData Network of Excellence (20)

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 

Recently uploaded

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Exploiting Relational Databases for Efficient Stream Processing

  • 1. MonetDB/DataCell Exploiting the Power of Relational Databases for Efficient Stream Processing CWI Project Meeting@Innsbruck Feb 28 - Mar 04, 2011 Wednesday, March 02, 2011
  • 2. DBMS versus DSMS 1 2 One-time query Incoming data DB answer 4 1 Store incoming tuples 2 Submit one-time query 3 3 Query processing on the already stored data 4 Create answer Disk storage Wednesday, March 02, 2011
  • 3. DBMS versus DSMS 1 2 One-time query Incoming data DB answer 4 1 Store incoming tuples 2 Submit one-time query 3 3 Query processing on the already stored data 4 Create answer Disk storage 4 3 2 Input stream Continuous queries notification 1 Memory 1 Submit continuous queries 2 Incoming streams A data stream is a never 3 Input stream is processed on the fly ending sequence of tuples 4 The produced results are continuously delivered to the clients Wednesday, March 02, 2011
  • 4. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 5. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 6. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 7. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 8. One-time Queries versus Continuous Queries arrival time of q One-time Continuous query query t of data tn t n+1 One-time query q Evaluated once over the already stored tuples Continuous query q Waits for future incoming tuples www q Evaluated continuously as new tuples arrive Wednesday, March 02, 2011
  • 9. Observation • Nowadays stream systems are built from scratch • Redesign operators and optimizations • Relational Databases are considered inefficient and too complex • Modern stream applications require both management of stored and streaming data Wednesday, March 02, 2011
  • 10. Goals • We design the DataCell on top of an existing DataBase Kernel • Exploit database techniques, query optimization and operators • Provide full language functionalities (SQL’03) • Research questions • is it viable? • multi-query processing/scheduling • real-time processing Wednesday, March 02, 2011
  • 11. The Basic Idea of DataCell • Stream tuples are first stored in (appended to) baskets. • We evaluate the continuous queries over the baskets. Instead of throwing each incoming tuple against the waiting queries (Data Streams) tuple Query Set first collect the data and then throw the queries against the tuples (DataBase) tuple Query Set • Once a tuple is seen, it is dropped from its basket. Wednesday, March 02, 2011
  • 12. The MonetDB/DataCell stack SQL Query SQL Query parser Query Optimizer MAL MAL Interpreter Query Executor Wednesday, March 02, 2011
  • 13. The MonetDB/DataCell stack SQL Query SQL Query parser + CQ Query Optimizer + DC opt Continuous Query Scheduler MAL MAL Interpreter Query Executor Wednesday, March 02, 2011
  • 14. DataCell Components Receptor <=> Listens to a stream Emitter <=> Delivers events to the clients Factory <=> Continuous query Basket <=> Holds events Input Stream Output Stream R Q E Wednesday, March 02, 2011
  • 15. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 16. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 17. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 18. DataCell Architecture SQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 19. DataCell Architecture SQL Compiler SPARQL Compiler Data Columns MAL Optimizer DataCell R1 id a a E1 id c Continuous Query Scheduler id b id a’ id k’ R2 id k E2 id b’ R3 E3 id k’’ id m Legend id n id n’ Basket Receptor Disk Storage Emitter Factory Wednesday, March 02, 2011
  • 20. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories Tumbling window Q1: Select * From [Select * from X top 3] as S where S.a>10; Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 21. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 22. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 23. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( [Select * From X top 1] Union Select * From X top 2 offset 1) as S WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 24. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 Union Q2 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 25. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 12 Union Q2 100 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 26. Basket Expressions q Syntax: It is an SQL sub-query surrounded by square brackets q Semantics: All qualifying tuples in a basket expression are removed by the factories 12 Tumbling window 3 Q1 12 100 100 Q1: Select * From [Select * from X top 3] as S where S.a>10; 14 Sliding window Q2: SELECT * FROM ( 12 [Select * From X top 1] 3 12 Union Q2 100 100 Select * From X top 2 offset 1) as S 14 WHERE S.a>10; q Flexible/expressive continuous queries, by selectively picking the data to process from a basket q Allow to process predicate windows on a stream. q out of order processing Wednesday, March 02, 2011
  • 27. Query processing strategies Separate Baskets • Each continuous query is encapsulated within a single factory • Each factory f has it own input baskets, that are accessed only by f • If more than one factory are interested for the same data, we create multiple copies of this data • Factories are completely independent • Exploit column-store to minimize the overhead of replication bcopy1 Q1 b bcopy2 Qcopy Q2 bcopy3 Q3 Wednesday, March 02, 2011
  • 28. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker Q1 b Q2 Q3 Wednesday, March 02, 2011
  • 29. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 b Lock FL2 Q2 FL3 Q3 Wednesday, March 02, 2011
  • 30. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 FL3 Q3 FU3 Wednesday, March 02, 2011
  • 31. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 Unlock FL3 Q3 FU3 Wednesday, March 02, 2011
  • 32. Query processing strategies Shared Baskets • Exploit query similarities to avoid replication • Baskets are shared among factories • Two new (cheap) factories Locker, Unlocker FL1 Q1 FU1 b Lock FL2 Q2 FU2 Unlock FL3 Q3 FU3 Wednesday, March 02, 2011
  • 33. Summary + = DataCell Wednesday, March 02, 2011