SlideShare a Scribd company logo
1 of 14
DATA CUBE
COMPUTATION
Why data cube computation is needed?
• To retrieve the information from the data cube in the most efficient way
possible.

• Queries run on the cube will be fast.
Cube Materialization(precomputation)
Different Data Cube materialization include
1. Full cube
2. Iceberg cube
3. Closed cube
4. Shell cube
The Full cube
• The multi way array aggregation method computes full data cube by using
a multidimensional array as its basic data structure
1. Partition array into the chunks
2. Compute aggregate by visiting (i.e. accessing the values at) cube cells

Advantage

the queries run on the cube will be very fast.
Disadvantage

pre-computed cube requires a lot of memory.
An Iceberg-Cube
• contains only those cells of the data cube that meet an aggregate
condition.
• It is called an Iceberg-Cube because it contains only some of the cells of
the full cube, like the tip of an iceberg.
• The purpose of the Iceberg-Cube is to identify and compute only those
values that will most likely be required for decision support queries.
• The aggregate condition specifies which cube values are more
meaningful and should therefore be stored.
• This is one solution to the problem of computing versus storing data
cubes.
Advantage:

pre-compute only those cells in the cube which will most likely be used for
decision support queries.
A Closed Cube
A closed cube is a data cube consisting of only closed cells

Shell Cube
we can choose to precompute only portions or fragments of the
cube shell, based on cuboids of interest.
General strategies for data cube
computation
1. Sorting hashing and grouping
2. Simultaneous aggregation and caching intermediate
results
3. Aggregation from smallest child when there exist
multiple child cuboid
4. The Apriori pruning method can be explored to compute
iceberg cube efficiently
1. Sorting, hashing and grouping.
These operations facilitate aggregation, i.e. computation of the cells that share
the same set of dimension values.

These techniques can also perform:
o shared-sorts: sharing sorting costs across multiple cuboids
o share-partitions: sharing partitioning costs across multiple cuboids

Example:
To compute total sales by branch, day, and item, it is more efficient to sort tuples
or cells by branch, and then by day, and then group them according to the item
name.
2. Simultaneous aggregation and caching intermediate
results.
Reduce expensive disk I/O operations by computing higher-level
group-bys from computed lower-level group-bys.
These techniques can also perform:
o Amortized-scans: computing as many cuboids as possible at the
same time to reduce disk reads

Example:
To compute sales by branch, we can use the intermediate results derived from the
computation of a lower-level cuboid, such as sales by branch and day.
3. Aggregation from the smallest child.
If a parent ‘cuboid’ has more than one child, it is efficient to compute it
from the smallest previously computed child ‘cuboid’.
Example:
To compute a sales cuboid, Cbranch, when there exist two previously computed
cuboids, C{branch,year} and C{branch,tem}, it is obviously more efficient to compute
Cbranch from the former than from the latter if there are many more distinct items
than distinct years.
4. The Apriori pruning method can be explored to compute
iceberg cube efficiently
The Apriori property, in the context of data cubes, states as follow:
If given cell does not satisfy minimum support, then no descendant (i.e. more
specialized or detailed version ) of the cell will satisfy minimum support either.
This property can be used to substantially reduce the computation of iceberg
cubes.
Example:
Notice that because cell (a2, b2) is empty, it can be effectively discarded in
subsequent computations, based on the Apriori property.
Thank You…………

More Related Content

What's hot

Distributed design alternatives
Distributed design alternativesDistributed design alternatives
Distributed design alternativesPooja Dixit
 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactionsNilu Desai
 
Dynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler designDynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler designkunjan shah
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMSkoolkampus
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACAPankaj Kumar Jain
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS Dr Neelesh Jain
 
Seven step model of migration into the cloud
Seven step model of migration into the cloudSeven step model of migration into the cloud
Seven step model of migration into the cloudRaj Raj
 
Data mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityData mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityRushali Deshmukh
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysisKrish_ver2
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 

What's hot (20)

Distributed design alternatives
Distributed design alternativesDistributed design alternatives
Distributed design alternatives
 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactions
 
Dynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler designDynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler design
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMS
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
 
Unit iv(simple code generator)
Unit iv(simple code generator)Unit iv(simple code generator)
Unit iv(simple code generator)
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
Seven step model of migration into the cloud
Seven step model of migration into the cloudSeven step model of migration into the cloud
Seven step model of migration into the cloud
 
Data mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityData mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarity
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Distributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data ControlDistributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data Control
 

Viewers also liked

OLAP Cubes: Basic operations
OLAP Cubes: Basic operationsOLAP Cubes: Basic operations
OLAP Cubes: Basic operationsSthefan Berwanger
 
MS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data miningMS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data miningDataminingTools Inc
 
Olap Cube Design
Olap Cube DesignOlap Cube Design
Olap Cube Designh1m
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
How I data mined my text message history
How I data mined my text message historyHow I data mined my text message history
How I data mined my text message historyJoe Cannatti Jr.
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsSalah Amean
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretizationKrish_ver2
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)JamesDempsey1
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 

Viewers also liked (20)

OLAP Cubes: Basic operations
OLAP Cubes: Basic operationsOLAP Cubes: Basic operations
OLAP Cubes: Basic operations
 
MS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data miningMS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data mining
 
Olap Cube Design
Olap Cube DesignOlap Cube Design
Olap Cube Design
 
Datacube
DatacubeDatacube
Datacube
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
How I data mined my text message history
How I data mined my text message historyHow I data mined my text message history
How I data mined my text message history
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Data visualization
Data visualizationData visualization
Data visualization
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Similar to Data cube computation

Lecture 8 is for best and you should read
Lecture 8 is for best and you should readLecture 8 is for best and you should read
Lecture 8 is for best and you should readcentralcollegepkr
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5 Salah Amean
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationDatamining Tools
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptSubrata Kumer Paul
 
Comparison between cube techniques
Comparison between cube techniquesComparison between cube techniques
Comparison between cube techniquesijsrd.com
 
Analysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory ManagementAnalysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory Managementijtsrd
 
A so common questions and answers
A so common questions and answersA so common questions and answers
A so common questions and answersAmit Sharma
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit
 
Understanding DLmalloc
Understanding DLmallocUnderstanding DLmalloc
Understanding DLmallocHaifeng Li
 
4 memory management bb
4   memory management bb4   memory management bb
4 memory management bbShahid Riaz
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering TypesAshwin Shenoy M
 
Data Warehouse Implementation
Data Warehouse ImplementationData Warehouse Implementation
Data Warehouse Implementationomayva
 

Similar to Data cube computation (20)

Lecture 8 is for best and you should read
Lecture 8 is for best and you should readLecture 8 is for best and you should read
Lecture 8 is for best and you should read
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
05 cubetech
05 cubetech05 cubetech
05 cubetech
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.ppt
 
datacub
datacubdatacub
datacub
 
Comparison between cube techniques
Comparison between cube techniquesComparison between cube techniques
Comparison between cube techniques
 
Birch1
Birch1Birch1
Birch1
 
Advanced Trees
Advanced TreesAdvanced Trees
Advanced Trees
 
Analysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory ManagementAnalysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory Management
 
Birch
BirchBirch
Birch
 
Optimization in essbase
Optimization in essbaseOptimization in essbase
Optimization in essbase
 
Introduction to Bizur
Introduction to BizurIntroduction to Bizur
Introduction to Bizur
 
A so common questions and answers
A so common questions and answersA so common questions and answers
A so common questions and answers
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
05 cubetech
05 cubetech05 cubetech
05 cubetech
 
Understanding DLmalloc
Understanding DLmallocUnderstanding DLmalloc
Understanding DLmalloc
 
4 memory management bb
4   memory management bb4   memory management bb
4 memory management bb
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
 
Data Warehouse Implementation
Data Warehouse ImplementationData Warehouse Implementation
Data Warehouse Implementation
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Data cube computation

  • 2. Why data cube computation is needed? • To retrieve the information from the data cube in the most efficient way possible. • Queries run on the cube will be fast.
  • 3. Cube Materialization(precomputation) Different Data Cube materialization include 1. Full cube 2. Iceberg cube 3. Closed cube 4. Shell cube
  • 4. The Full cube • The multi way array aggregation method computes full data cube by using a multidimensional array as its basic data structure 1. Partition array into the chunks 2. Compute aggregate by visiting (i.e. accessing the values at) cube cells Advantage the queries run on the cube will be very fast. Disadvantage pre-computed cube requires a lot of memory.
  • 5. An Iceberg-Cube • contains only those cells of the data cube that meet an aggregate condition. • It is called an Iceberg-Cube because it contains only some of the cells of the full cube, like the tip of an iceberg. • The purpose of the Iceberg-Cube is to identify and compute only those values that will most likely be required for decision support queries. • The aggregate condition specifies which cube values are more meaningful and should therefore be stored. • This is one solution to the problem of computing versus storing data cubes. Advantage: pre-compute only those cells in the cube which will most likely be used for decision support queries.
  • 6. A Closed Cube A closed cube is a data cube consisting of only closed cells Shell Cube we can choose to precompute only portions or fragments of the cube shell, based on cuboids of interest.
  • 7. General strategies for data cube computation 1. Sorting hashing and grouping 2. Simultaneous aggregation and caching intermediate results 3. Aggregation from smallest child when there exist multiple child cuboid 4. The Apriori pruning method can be explored to compute iceberg cube efficiently
  • 8. 1. Sorting, hashing and grouping. These operations facilitate aggregation, i.e. computation of the cells that share the same set of dimension values. These techniques can also perform: o shared-sorts: sharing sorting costs across multiple cuboids o share-partitions: sharing partitioning costs across multiple cuboids Example: To compute total sales by branch, day, and item, it is more efficient to sort tuples or cells by branch, and then by day, and then group them according to the item name.
  • 9. 2. Simultaneous aggregation and caching intermediate results. Reduce expensive disk I/O operations by computing higher-level group-bys from computed lower-level group-bys. These techniques can also perform: o Amortized-scans: computing as many cuboids as possible at the same time to reduce disk reads Example: To compute sales by branch, we can use the intermediate results derived from the computation of a lower-level cuboid, such as sales by branch and day.
  • 10. 3. Aggregation from the smallest child. If a parent ‘cuboid’ has more than one child, it is efficient to compute it from the smallest previously computed child ‘cuboid’. Example: To compute a sales cuboid, Cbranch, when there exist two previously computed cuboids, C{branch,year} and C{branch,tem}, it is obviously more efficient to compute Cbranch from the former than from the latter if there are many more distinct items than distinct years.
  • 11. 4. The Apriori pruning method can be explored to compute iceberg cube efficiently The Apriori property, in the context of data cubes, states as follow: If given cell does not satisfy minimum support, then no descendant (i.e. more specialized or detailed version ) of the cell will satisfy minimum support either. This property can be used to substantially reduce the computation of iceberg cubes.
  • 13. Notice that because cell (a2, b2) is empty, it can be effectively discarded in subsequent computations, based on the Apriori property.