Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Res...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Time Series
 A time series is a sequence of pairs
-...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Mining Constraints
Oracle Confidential – Intern...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Generic Data Mining
 Create an approximation of the...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Res...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The Symbolic Representation Of Time Series
A number...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What is SAX?
 SAX is a methodology for reducing a t...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What is lower bounding?
Oracle Confidential – Intern...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What’s a SAX Word?
A SAX word is the symbol generat...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Symbolic Aggregate ApproXimation
Lower bounding of ...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Normalization of Time Series
 Normalization to Zero...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Res...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How to obtain SAX?
 Data is divided into w equal si...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How to obtain SAX?
Step 1: Reduce dimension by PAA
T...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How to obtain SAX?
Step 2: Discretization
Normalize ...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Res...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Gaussian distribution
 Most "natural" distributions...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Distance Measure
Oracle Confidential – Internal/Rest...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Distance Measure
Oracle Confidential – Internal/Rest...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Distance Measure
Define MINDIST after transforming ...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Novelty Detection
 Fault detection
 Interestingnes...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Res...
Upcoming SlideShare
Loading in …5
×

SAX-TimeSeries

4,330 views

Published on

  • Login to see the comments

SAX-TimeSeries

  1. 1. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 1 Symbolic Representations of Time Series - Nikita
  2. 2. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Time Series  A time series is a sequence of pairs - Each pair consists of a Time Index and a Value - The Time Index may be implied if there is a constant difference between values  The time series can be segmented into “Windows” which represent the time series between 2 Time Indices  Symbols can represent Windows. Because symbols in a Finite Symbol Space have a probability, we can think of the probability of a time series. Symbols are easy to store and manipulate – each symbol can be represented as an integer Oracle Confidential – Internal/Restricted/Highly Restricted 2 0 2000 4000 6000 8000 0 10 20 30
  3. 3. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Data Mining Constraints Oracle Confidential – Internal/Restricted/Highly Restricted 3 For example, suppose you have one gig of main memory and want to do K-means clustering…Clustering ¼ gig of data, 100 sec Clustering ½ gig of data, 200 sec Clustering 1 gig of data, 400 sec Clustering 1.1 gigs of data, few hours
  4. 4. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Generic Data Mining  Create an approximation of the data, which will fit in main memory, yet retains the essential features of interest  Approximately solve the problem at hand in main memory  Make (hopefully very few) accesses to the original data on disk to confirm the solution Oracle Confidential – Internal/Restricted/Highly Restricted 4
  5. 5. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 5 Some Common Approximation
  6. 6. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | The Symbolic Representation Of Time Series A number of algorithms exist to represent time series as symbols in a Finite Symbol Space  These algorithms are often though of as “Feature Reducers” Self Organizing Maps are a traditional form of Feature Reducer SAX (Symbolic Aggregate approXimation) is another, designed specifically for time series There are many other ways to reduce a time series to symbol  As long as the symbol is drawn from a Finite Symbol Space, the technique described here will work Oracle Confidential – Internal/Restricted/Highly Restricted 6
  7. 7. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What is SAX?  SAX is a methodology for reducing a time series window to a symbol  The technique was developed by Dr. Eamonn Keogh et al. at the University of California at Riverside in the early 2000’s  It has since drawn a great deal of attention in the world of time series analysis  Allows a time series of arbitrary length n to be reduced to a string of arbitrary length w (w<<n)  SAX is the first symbolic representation for time series that allows for dimensionality reduction and indexing with a lower-bounding distance measure. Oracle Confidential – Internal/Restricted/Highly Restricted 7
  8. 8. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What is lower bounding? Oracle Confidential – Internal/Restricted/Highly Restricted 8  Lower bounding means that for all Q and S, we have DLB(Q’,S’) <= D(Q,S).
  9. 9. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What’s a SAX Word? A SAX word is the symbol generated by the SAX algorithm It is defined by a SAX Alphabet and a length  The SAX Alphabet is traditionally represented by letters, and its components are referred to as “SAX Letters”  The size of the alphabet is typically small – this is particularly important for anomaly detection When we write out a description of a SAX word, we typically use a string like representation, such as “abcdefg”  SAX letters don’t have to be letters – implementations often use numbers based at zero, however, we often display them as letters Oracle Confidential – Internal/Restricted/Highly Restricted 9
  10. 10. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Symbolic Aggregate ApproXimation Lower bounding of Euclidean distance Dimensionality Reduction Numerosity Reduction Oracle Confidential – Internal/Restricted/Highly Restricted 10 baabccbc
  11. 11. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Normalization of Time Series  Normalization to Zero Mean and Unit of Energy.  The procedure ensures, that all elements of the input vector are transformed into the output vector whose mean is approximately 0 while the standard deviation is in a range close to 1. The formula behind the transform is shown below:  Z-normalization is an essential preprocessing step which allows an algorithm to focus on the structural similarities/dissimilarities rather than on the amplitude. In order to make meaningful comparisons between two time series, both must be normalized. Oracle Confidential – Internal/Restricted/Highly Restricted 11
  12. 12. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 12
  13. 13. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How to obtain SAX?  Data is divided into w equal sized frames.  Mean value of the data falling within a frame is calculated  Vector of these values becomes the PAA Oracle Confidential – Internal/Restricted/Highly Restricted 13 0 -- 0 20 40 60 80 100 120 bb b a c c c a
  14. 14. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How to obtain SAX? Step 1: Reduce dimension by PAA Time series C of length n can be represented in a w-dimensional space by a vector Ć = ć1,…ćw The ith element is calculated by Oracle Confidential – Internal/Restricted/Highly Restricted 14    i ij jn w i w n w n cc 1)1(
  15. 15. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How to obtain SAX? Step 2: Discretization Normalize Ć to have a Gaussian distribution Determine breakpoints that will produce a equal-sized areas under Gaussian curve Oracle Confidential – Internal/Restricted/Highly Restricted 15 0 -- 0 20 40 60 80 100 120 bb b a c c c a baabccbc Words: 8 Alphabet: 3
  16. 16. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 16
  17. 17. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Gaussian distribution  Most "natural" distributions  A Gaussian process uses lazy learning and a measure of the similarity between points (this is the kernel function) to predict the value for an unseen point from training data Oracle Confidential – Internal/Restricted/Highly Restricted 17 Ref : https://www.isixsigma.com/tools-templates/normality/tips-recognizing-and-transforming- non-normal-data/
  18. 18. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Distance Measure Oracle Confidential – Internal/Restricted/Highly Restricted 18 • Given 2 time series Q and C – Euclidean distance – Distance after transforming the subsequence to PAA
  19. 19. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Distance Measure Oracle Confidential – Internal/Restricted/Highly Restricted 19 • Given 2 time series Q and C – Euclidean distance – Distance after transforming the subsequence to PAA
  20. 20. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Distance Measure Define MINDIST after transforming to symbolic representation MINDIST lower bounds the true distance between the original time series Oracle Confidential – Internal/Restricted/Highly Restricted 20 baabccbcCˆ babcaccaQˆ     w i iiw n cqdistCQMINDIST 1 2 )ˆ,ˆ()ˆ,ˆ( dist() can be implemented using a table lookup.
  21. 21. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Novelty Detection  Fault detection  Interestingness detection  Anomaly detection  Surprisingness detection Oracle Confidential – Internal/Restricted/Highly Restricted 21
  22. 22. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 22

×