The document discusses symbolic representations of time series data using techniques like SAX (Symbolic Aggregate approXimation). It provides details on:
- Representing time series as sequences of time-value pairs that can be segmented into windows and represented by symbols
- Using techniques like SAX to reduce time series data to symbols from a finite symbol space, allowing for dimensionality reduction and efficient storage and processing.
- The SAX algorithm which discretizes time series windows based on breaking points from a Gaussian distribution to map windows to symbols while preserving distances between time series.
A time series is a collection of observations made sequentially in time
Researchers have proposed various methodologies to represent time series more efficicently, inclusing dimensionality reduction and numerosity reduction technique.
Discrete Wavelet Transform (DWT) and Discrete Fourier Transform (DFT) , while requiring less storage space
Another line of research on time series representation focuses on converting numeric values into symbolic form.
SAx adapts both DR and NR technquies
Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same (or almost the same) analytical results Why data reduction? — A database/data warehouse may store terabytes of data. Complex data analysis may take a very long time to run on the complete data set.
Dimensionality reduction, e.g., remove unimportant attribute
Dimensionality reduction ◦ Avoid the curse of dimensionality ◦ Help eliminate irrelevant features and reduce noise ◦ Reduce time and space required in data mining ◦ Allow easier visualization
Numerosity reduction (some simply call it: Data Reduction)
Reduce data volume by choosing alternative, smaller forms of data representation
Parametric methods (e.g., regression) ◦ Assume the data fits some model, estimate model parameters, store only the parameters, and discard the data (except possible outliers) ◦ Ex.: Log-linear models—obtain value at a point in m-D space as the product on appropriate marginal subspaces Non-parametric methods ◦ Do not assume models ◦ Major families: histograms, clustering, sampling
◦ Data compression
The values that have a larger scale will be given an increased weight (that the other components contribute as well.). Feature scaling is a pretty common normalization technique, and what I usually default to unless there is a reason to attempt another technique.
In order to make meaningful comparisons between two time series, both must be normalized.
Data normalization (centering & scaling) tends to helps more with model convergence/stability when dealing with maching learning algorithms. . Feeding ML algorithms input data with wildly different mean/variance can slow or prevent model convergence.
If you have multiple inputs, and the amplitudes of your inputs are different then it is better to normalize your inputs. In other words, if you have inputs with different means and variance, when you do normalization, you make all of them to have zero mean and one variance. Thus the weight of all input on the output becomes same. To do normalization you can subtract mean of each input from itself and then divide by its standard deviation.
Compute the SAX letter by dividing the Standard Normal Distribution into K regions of equal area under the curve and assigning each component of the PAA a letter from the SAX Alphabet corresponding to the region indexed by the PAA value Repeating for each value of the PAA yields a SAX word of equivalent length to the PAA
First convert the time series to PAA representation, then convert the PAA to symbols
It take linear time
First convert the time series to PAA representation, then convert the PAA to symbols. It take linear time
Normalization to Zero Mean and Unit of Energy.
The procedure ensures, that all elements of the input vector are transformed into the output vector whose mean is approximately 0 while the standard deviation is in a range close to 1. The formula behind the transform is shown below:
z-normalization is an essential preprocessing step which allows an algorithm to focus on the structural similarities/dissimilarities rather than on the amplitude.
Compute the SAX letter by dividing the Standard Normal Distribution into K regions of equal area under the curve and assigning each component of the PAA a letter from the SAX Alphabet corresponding to the region indexed by the PAA value Repeating for each value of the PAA yields a SAX word of equivalent length to the PAA
It is assumed that the normalised time series has a Gaussian distribution. Next the so-called 'breakpoints' are determined that will produce kequal-sized areas under the standard normal curve, shown with coloured dotted lines in the 2nd figure. All PAA coefficients that are below the smallest breakpoint are mapped to the symbol 'a', all coefficients greater than equal to the smallest breakpoint and less than the second-smallest breakpoint are mapped to the symbol 'b', and so on. Have a look at Fig. 2 to see what is going on.
Normal distribution f (x) = 1 σ √ 2π exp[−(x − µ) 2/2σ 2 ]. 2
Skewness = 1 n Pn i=1 (xi−x¯) 3 s 3 . 3 Kurtosis = 1 n Pn i=1 (xi−x¯) 4 s 4 . where x¯ is the mean, s is the standard deviation, and n is the length of time series. 4 Remarks
Skewness is a measure of the asymmetry of the probability density function.
This assignment is done by dividing the Standard Normal Distribution into K + 1 sections of equal area under the curve, and then assigning the letter corresponding to the point on the curve the value lies. This results in an array of length N, each component being a value between 0 and K, which can be treated as a symbol
It is assumed that the normalised time series has a Gaussian distribution. Next the so-called 'breakpoints' are determined that will produce kequal-sized areas under the standard normal curve, shown with coloured dotted lines in the 2nd figure. All PAA coefficients that are below the smallest breakpoint are mapped to the symbol 'a', all coefficients greater than equal to the smallest breakpoint and less than the second-smallest breakpoint are mapped to the symbol 'b', and so on. Have a look at Fig. 2 to see what is going on.
Kurtosis is a measure of the flatness of the probability density function.
The normal (Gaussian) distribution exhibits the zero skewness, and a kurtosis value of 3.
Normal distribution f (x) = 1 σ √ 2π exp[−(x − µ) 2/2σ 2 ]. 2
Skewness = 1 n Pn i=1 (xi−x¯) 3 s 3 . 3 Kurtosis = 1 n Pn i=1 (xi−x¯) 4 s 4 . where x¯ is the mean, s is the standard deviation, and n is the length of time series. 4 Remarks
Skewness is a measure of the asymmetry of the probability density function.
This assignment is done by dividing the Standard Normal Distribution into K + 1 sections of equal area under the curve, and then assigning the letter corresponding to the point on the curve the value lies. This results in an array of length N, each component being a value between 0 and K, which can be treated as a symbol
It is assumed that the normalised time series has a Gaussian distribution. Next the so-called 'breakpoints' are determined that will produce kequal-sized areas under the standard normal curve, shown with coloured dotted lines in the 2nd figure. All PAA coefficients that are below the smallest breakpoint are mapped to the symbol 'a', all coefficients greater than equal to the smallest breakpoint and less than the second-smallest breakpoint are mapped to the symbol 'b', and so on. Have a look at Fig. 2 to see what is going on.
Kurtosis is a measure of the flatness of the probability density function.
The normal (Gaussian) distribution exhibits the zero skewness, and a kurtosis value of 3.
First convert the time series to PAA representation, then convert the PAA to symbols
It take linear time