SlideShare a Scribd company logo
1 of 18
Download to read offline
1


      Data Compression for Large
      Multidimensional Data
      Warehouses



                     Supervisor:                                          Presented by:
     Dr. K.M. Azharul Hasan                                 Abdullah Al Mahmud,
            Associate Professor,                                    Roll : 0507006
       Head of the Department,                            Md. Mushfiqur Rahman,
      Department of CSE, KUET                                       Roll : 0507029

This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis
2


Presentation Layout

 Objectives
 Existing Compression Schemes
 Traditional Extendible Array
 Proposed Compression Scheme
 EXCS
 (Extendible Array Based Compression Scheme)
Comparative Analysis
Conclusion

   Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
3



Objectives
Data compression technology reduces:
  effective price of logical data storage capacity
 improves query performance

 Multidimensional array is widely used in large
 number of scientific research.
 An efficient compression of multidimensional
 array can handle large multidimensional data
 sets of data warehouses

    Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
4



Existing Compression Schemes                                       (1/ 3)

    Bitmap compression
    Run Length Encoding
    Header compression
    Compressed Column Storage
    Compressed Row Storage




  Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
5



Existing Compression Schemes                                       (2/ 3)




      (a) A sparse array.            (b) The CRS scheme




  Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
6


Existing Compression Schemes                                        (3/ 3)

  Classical methods cannot support updates
   without completely readjusting runs .

  Compressing sparse array

   Do not support extendibility




   Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
7


  Traditional Extendible Array
                                              History
                                              Table
                                                        0    1    3     5
 TEA supports
  dynamic extension                           Address
                                              Table
                                                        0    1    4     9
  of dimension size.
                                          0       0     0    1    4     9

   Position <1,3>                         2       2     2    3    5     10

   H1[1]<H2[3]                            4       6     6    7    8     11


Address of                                      History Counter= 0
                                                                 4
                                                                 2
                                                                 3
                                                                 5
                                                                 1

Cell=Address1[3]+1=10
                                     Figure 1: TEA Construction And Access

     Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
8


Proposed Compression Scheme
Multidimensional arrays are important for
 sparse array operations

Extendibility of multidimensional arrays

 A compression technique that can work on
 multidimensional extendible array

 Our proposed compression scheme is EXCS
 (Extendible array based Compression
 Scheme)
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
9


Extendible array based
Compression Scheme (EXCS)                                                 1/3

 We implemented the multidimensional
  extendible array in secondary memory

 We have considered dimension =3 in our
  experimental approach

 The sub-arrays are distinguished to store
  them individually in the secondary memory

  Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
10


Extendible array based
Compression Scheme (EXCS)                                                 2/3

 The sub-arrays are of n-1(=2) dimension

 A large no. of sub-arrays are generated to be
  compressed

 Sub-arrays are dynamically taken as input

 Only the max no of sub-arrays is to be given
  Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
11


Extendible array based
Compression Scheme (EXCS)                                                  3/3

 Each sub-array is compressed individually

 The compression technique used is similar to
  CRS

 The compressed elements are written in the
  secondary memory as RO, CO, VL of
  subarray_1, subarray_2, … … subarray_N
   Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
12


Performance Measurement
Performance is measured by measuring two
 key factors of the compression schemes:
  Data Density
  Length of Dimension/ Number of Data

 compression ratio=
    (compressed data/ original data)
 space savings = 1 – compression ratio

 we have considered space savings in percent
   Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
13


                 Comparative Analysis                                                     (1/4)
                100



                80



                60
Space savings




                                                                                              Header
                40
                                                                                              Bitmap
                                                                                              CRS
                                                                                              EACRS
                20
                                                                                              Offset


                 0
                        64            729           4096          15625         46656


                -20



                -40
                                                   No. of data
                                Figure: Comparison with fixed density = 20%
                      Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
14


                      Comparative Analysis                                                    (2/4)
                80




                60




                40
Space savings




                                                                                                Header
                                                                                                Bitmap
                20                                                                              CRS
                                                                                                EACRS
                                                                                                Offset

                 0
                          64           729           4096          15625         46656



                -20




                -40
                                                 No. of data
                               Figure: Comparison with fixed density = 25%
                      Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
15


Comparative Analysis                                                               (3/4)
                     100



                     80



                     60
 compression ratio




                     40
                                                                                   Header

                                                                                   Bitmap
                     20

                                                                                   CRS

                      0
                                                                                   EACRS
                           10        20             30      40       50

                                                                                   Offset
                     -20



                     -40



                     -60
                                          Density of data
                           Figure: Comparison with fixed no. of data=64
           Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
16


Comparative Analysis 100
                                                                                        (4/4)

                     80



                     60
 compression ratio




                     40
                                                                                             Header
                                                                                             Bitmap
                     20                                                                      CRS
                                                                                             EACRS
                                                                                             Offset
                      0
                             10          20            30        40          50


                     -20



                     -40



                     -60
                                              Density of data
                           Figure: Comparison with fixed no. of data=4096
                     Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
17



Performance Measurement

 Extendibility of arrays
 Using multidimensional arrays
 Extendibility toward any dimension
EXCS allows dynamic extension of arrays.
In analysis, we can extend data up to n
 dimensions
 Performance is good for large no. of data


    Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
18



Conclusion
 Our proposed compression scheme is
 experimentally done up to 3 dimension data

 It can be extended experimentally for
 compressing n dimension data in future.

EXCS is effective for large multidimensional
 data warehouses


   Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

More Related Content

Viewers also liked

Text compression in LZW and Flate
Text compression in LZW and FlateText compression in LZW and Flate
Text compression in LZW and FlateSubeer Rangra
 
data compression technique
data compression techniquedata compression technique
data compression techniqueCHINMOY PAUL
 
Fundamentals of Data compression
Fundamentals of Data compressionFundamentals of Data compression
Fundamentals of Data compressionM.k. Praveen
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedSlideShare
 

Viewers also liked (7)

Text compression in LZW and Flate
Text compression in LZW and FlateText compression in LZW and Flate
Text compression in LZW and Flate
 
Data compression
Data compressionData compression
Data compression
 
Compression techniques
Compression techniquesCompression techniques
Compression techniques
 
data compression technique
data compression techniquedata compression technique
data compression technique
 
Data compression
Data compressionData compression
Data compression
 
Fundamentals of Data compression
Fundamentals of Data compressionFundamentals of Data compression
Fundamentals of Data compression
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 

Data Compression Techniques for Large Multidimensional Data Warehouses

  • 1. 1 Data Compression for Large Multidimensional Data Warehouses Supervisor: Presented by: Dr. K.M. Azharul Hasan Abdullah Al Mahmud, Associate Professor, Roll : 0507006 Head of the Department, Md. Mushfiqur Rahman, Department of CSE, KUET Roll : 0507029 This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis
  • 2. 2 Presentation Layout  Objectives  Existing Compression Schemes  Traditional Extendible Array  Proposed Compression Scheme  EXCS (Extendible Array Based Compression Scheme) Comparative Analysis Conclusion Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 3. 3 Objectives Data compression technology reduces:  effective price of logical data storage capacity improves query performance  Multidimensional array is widely used in large number of scientific research.  An efficient compression of multidimensional array can handle large multidimensional data sets of data warehouses Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 4. 4 Existing Compression Schemes (1/ 3)  Bitmap compression  Run Length Encoding  Header compression  Compressed Column Storage  Compressed Row Storage Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 5. 5 Existing Compression Schemes (2/ 3) (a) A sparse array. (b) The CRS scheme Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 6. 6 Existing Compression Schemes (3/ 3) Classical methods cannot support updates without completely readjusting runs . Compressing sparse array  Do not support extendibility Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 7. 7 Traditional Extendible Array History Table 0 1 3 5  TEA supports dynamic extension Address Table 0 1 4 9 of dimension size. 0 0 0 1 4 9 Position <1,3> 2 2 2 3 5 10 H1[1]<H2[3] 4 6 6 7 8 11 Address of History Counter= 0 4 2 3 5 1 Cell=Address1[3]+1=10 Figure 1: TEA Construction And Access Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 8. 8 Proposed Compression Scheme Multidimensional arrays are important for sparse array operations Extendibility of multidimensional arrays  A compression technique that can work on multidimensional extendible array  Our proposed compression scheme is EXCS (Extendible array based Compression Scheme) Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 9. 9 Extendible array based Compression Scheme (EXCS) 1/3 We implemented the multidimensional extendible array in secondary memory We have considered dimension =3 in our experimental approach The sub-arrays are distinguished to store them individually in the secondary memory Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 10. 10 Extendible array based Compression Scheme (EXCS) 2/3 The sub-arrays are of n-1(=2) dimension A large no. of sub-arrays are generated to be compressed Sub-arrays are dynamically taken as input Only the max no of sub-arrays is to be given Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 11. 11 Extendible array based Compression Scheme (EXCS) 3/3 Each sub-array is compressed individually The compression technique used is similar to CRS The compressed elements are written in the secondary memory as RO, CO, VL of subarray_1, subarray_2, … … subarray_N Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 12. 12 Performance Measurement Performance is measured by measuring two key factors of the compression schemes:  Data Density  Length of Dimension/ Number of Data  compression ratio= (compressed data/ original data)  space savings = 1 – compression ratio  we have considered space savings in percent Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 13. 13 Comparative Analysis (1/4) 100 80 60 Space savings Header 40 Bitmap CRS EACRS 20 Offset 0 64 729 4096 15625 46656 -20 -40 No. of data Figure: Comparison with fixed density = 20% Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 14. 14 Comparative Analysis (2/4) 80 60 40 Space savings Header Bitmap 20 CRS EACRS Offset 0 64 729 4096 15625 46656 -20 -40 No. of data Figure: Comparison with fixed density = 25% Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 15. 15 Comparative Analysis (3/4) 100 80 60 compression ratio 40 Header Bitmap 20 CRS 0 EACRS 10 20 30 40 50 Offset -20 -40 -60 Density of data Figure: Comparison with fixed no. of data=64 Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 16. 16 Comparative Analysis 100 (4/4) 80 60 compression ratio 40 Header Bitmap 20 CRS EACRS Offset 0 10 20 30 40 50 -20 -40 -60 Density of data Figure: Comparison with fixed no. of data=4096 Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 17. 17 Performance Measurement  Extendibility of arrays  Using multidimensional arrays  Extendibility toward any dimension EXCS allows dynamic extension of arrays. In analysis, we can extend data up to n dimensions  Performance is good for large no. of data Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
  • 18. 18 Conclusion  Our proposed compression scheme is experimentally done up to 3 dimension data  It can be extended experimentally for compressing n dimension data in future. EXCS is effective for large multidimensional data warehouses Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh