Data Compression Techniques for Large Multidimensional Data Warehouses
1. 1
Data Compression for Large
Multidimensional Data
Warehouses
Supervisor: Presented by:
Dr. K.M. Azharul Hasan Abdullah Al Mahmud,
Associate Professor, Roll : 0507006
Head of the Department, Md. Mushfiqur Rahman,
Department of CSE, KUET Roll : 0507029
This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis
3. 3
Objectives
Data compression technology reduces:
effective price of logical data storage capacity
improves query performance
Multidimensional array is widely used in large
number of scientific research.
An efficient compression of multidimensional
array can handle large multidimensional data
sets of data warehouses
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
5. 5
Existing Compression Schemes (2/ 3)
(a) A sparse array. (b) The CRS scheme
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
6. 6
Existing Compression Schemes (3/ 3)
Classical methods cannot support updates
without completely readjusting runs .
Compressing sparse array
Do not support extendibility
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
7. 7
Traditional Extendible Array
History
Table
0 1 3 5
TEA supports
dynamic extension Address
Table
0 1 4 9
of dimension size.
0 0 0 1 4 9
Position <1,3> 2 2 2 3 5 10
H1[1]<H2[3] 4 6 6 7 8 11
Address of History Counter= 0
4
2
3
5
1
Cell=Address1[3]+1=10
Figure 1: TEA Construction And Access
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
8. 8
Proposed Compression Scheme
Multidimensional arrays are important for
sparse array operations
Extendibility of multidimensional arrays
A compression technique that can work on
multidimensional extendible array
Our proposed compression scheme is EXCS
(Extendible array based Compression
Scheme)
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
9. 9
Extendible array based
Compression Scheme (EXCS) 1/3
We implemented the multidimensional
extendible array in secondary memory
We have considered dimension =3 in our
experimental approach
The sub-arrays are distinguished to store
them individually in the secondary memory
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
10. 10
Extendible array based
Compression Scheme (EXCS) 2/3
The sub-arrays are of n-1(=2) dimension
A large no. of sub-arrays are generated to be
compressed
Sub-arrays are dynamically taken as input
Only the max no of sub-arrays is to be given
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
11. 11
Extendible array based
Compression Scheme (EXCS) 3/3
Each sub-array is compressed individually
The compression technique used is similar to
CRS
The compressed elements are written in the
secondary memory as RO, CO, VL of
subarray_1, subarray_2, … … subarray_N
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
12. 12
Performance Measurement
Performance is measured by measuring two
key factors of the compression schemes:
Data Density
Length of Dimension/ Number of Data
compression ratio=
(compressed data/ original data)
space savings = 1 – compression ratio
we have considered space savings in percent
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
13. 13
Comparative Analysis (1/4)
100
80
60
Space savings
Header
40
Bitmap
CRS
EACRS
20
Offset
0
64 729 4096 15625 46656
-20
-40
No. of data
Figure: Comparison with fixed density = 20%
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
14. 14
Comparative Analysis (2/4)
80
60
40
Space savings
Header
Bitmap
20 CRS
EACRS
Offset
0
64 729 4096 15625 46656
-20
-40
No. of data
Figure: Comparison with fixed density = 25%
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
15. 15
Comparative Analysis (3/4)
100
80
60
compression ratio
40
Header
Bitmap
20
CRS
0
EACRS
10 20 30 40 50
Offset
-20
-40
-60
Density of data
Figure: Comparison with fixed no. of data=64
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
16. 16
Comparative Analysis 100
(4/4)
80
60
compression ratio
40
Header
Bitmap
20 CRS
EACRS
Offset
0
10 20 30 40 50
-20
-40
-60
Density of data
Figure: Comparison with fixed no. of data=4096
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
17. 17
Performance Measurement
Extendibility of arrays
Using multidimensional arrays
Extendibility toward any dimension
EXCS allows dynamic extension of arrays.
In analysis, we can extend data up to n
dimensions
Performance is good for large no. of data
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
18. 18
Conclusion
Our proposed compression scheme is
experimentally done up to 3 dimension data
It can be extended experimentally for
compressing n dimension data in future.
EXCS is effective for large multidimensional
data warehouses
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh